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Abstract 



This document describes a sizable grammar of English written in the TAG formalism and 
implemented for use with the XTAG system. This report and the grammar described herein 
supersedes the TAG grammar described in [[XTAG-Group, 1995| . The English grammar de- 



scribed in this report is based on the TAG formalism developed in |Joshi et al., 1975 1, which has 
been extend ed to include lexicalization (JSchabes et al, 1988| ), and unification-based feature 
structures ( flVijay-Shanker and Joshi, 1991| ). The range of syntactic phenomena that can be 
handled is large and includes auxiliaries (including inversion), copula, raising and small clause 
constructions, topicalization, relative clauses, infinitives, gerunds, passives, adjuncts, it-clefts, 
wh-clefts, PRO constructions, noun-noun modifications, extraposition, determiner sequences, 
genitives, negation, noun-verb contractions, sentential adjuncts and imperatives. This techni- 
cal report corresponds to the XTAG Release 8/31/98. The XTAG grammar is continuously 
updated with the addition of new analyses and modification of old ones, and an online version 
of this report can be found at the XTAG web page: http://www.cis.upenn.edu/~xtag/. 
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Chapter 1 

Getting Around 



This technical report presents the English XTAG grammar as implemented by the XTAG 
Research Group at the University of Pennsylvania. The technical report is organized into four 
parts, plus a set of appendices. Part 1 contains general information about the XTAG system 
and some of the underlying mechanisms that help shape the grammar. Chapter 2 contains 
an introduction to the formalism behind the grammar and parser, while Chapter 3 contains 
information about the entire XTAG system. Linguists interested solely in the grammar of the 
XTAG system may safely skip Chapters |2| and || Chapter || contains information on some of 
the linguistic principles that underlie the XTAG grammar, including the distinction between 
complements and adjuncts, and how case is handled. 

The actual description of the grammar begins with Part 2, and is contained in the following 
three parts. Parts 2 and 3 contains information on the verb classes and the types of trees 
allowed within the verb classes, respectively, while Part 4 contains information on trees not 
included in the verb classes (e.g. NP's, PP's, various modifiers, etc). Chapter || of Part 2 
contains a table that attempts to provide an overview of the verb classes and tree types by 
providing a graphical indication of which tree types are allowed in which verb classes. This has 
been cross-indexed to tree figures shown in the tech report. Chapter || contains an overview 
of all of the verb classes in the XTAG grammar. The rest of Part 2 contains more details on 
several of the more interesting verb classes, including ergatives, sentential subjects, sentential 
complements, small classes, ditransitives, and it-clefts. 

Part 3 contains information on some of the tree types that are available within the verb 
classes. These tree types correspond to what would be transformations in a movement based 
approach. Not all of these types of trees are contained in all of the verb classes. The table 
(previously mentioned) in Part 2 contains a list of the tree types and indicates which verb 
classes each occurs in. 

Part 4 focuses on the non-verb class trees in the grammar. NP's and determiners are 
presented in Chapter while the various modifier trees are presented in Chapter Il9[ Auxiliary 
verbs, which are classed separate from the verb classes, are presented in Chapter |2(], while 
certain types of conjunction are shown in Chapter 21. The XTAG treatment of comparatives 
is presented in Chapter 22, and our treatment of punctuation is discussed in Chapter |23|. 

Throughout the technical report, mention is occasionally made of changes or analyses that 
we hope to incorporate in the future. Appendix [A] details a list of these and other future 
work. The appendices also contain information on some of the nitty gritty details of the 
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XTAG grammar, including a system of metarules which can be used for grammar development 
and maintenance in Appendix |B|, a system for the organization of the grammar in terms of 
an inheritance hierarchy is in Appendix O, the tree naming conventions used in XTAG are 
explained in detail in Appendix and a comprehensive list of the features used in the grammar 
is given in Appendix ||. Appendix ||] contains an evaluation of the XTAG grammar, including 
comparisons with other wide coverage grammars. 



Chapter 2 

Feature-Based, Lexicalized Tree 
Adjoining Grammars 



The English grammar described in this report is based on the TAG formalism (Poshi ei 



al., 1975f] ) , which has been extended to include lexicalization ( [^chabes et al, 1988(| ), and 



unification-based feature structures ([ [Vijay-Shanker and Joshi, 199l| ). Tree Adjoining Lan- 
guages (TALs) fall into the class of mildly context-sensitive languages, and as such are more 
powerful than context free languages. The TAG formalism in general, and lexicalized TAGs 



in particular, are well-suited for linguistic applications. As first shown by | Joshi, 1985 and 



flKroch and Joshi, 1987 1, the properties of TAGs permit us to encapsulate diverse syntactic 



phenomena in a very natural way. For example, TAG's extended domain of locality and its 
factoring of recursion from local dependencies lead, among other things, to a localization of 
so-called unbounded dependencies. 



2.1 TAG formalism 

The primitive elements of the standard TAG formalism are known as elementary trees. Ele- 
mentary trees are of two types: initial trees and auxiliary trees (see Figure ^l"| ). In describing 
natural language, initial trees are minimal linguistic structures that contain no recursion, 
i.e. trees containing the phrasal structure of simple sentences, NP's, PP's, and so forth. Initial 
trees are characterized by the following: 1) all internal nodes are labeled by non-terminals, 2) 
all leaf nodes are labeled by terminals, or by non-terminal nodes marked for substitution. An 
initial tree is called an X-type initial tree if its root is labeled with type X. 

Recursive structures are represented by auxiliary trees, which represent constituents that 
are adjuncts to basic structures (e.g. adverbials). Auxiliary trees are characterized as follows: 
1) all internal nodes are labeled by non-terminals, 2) all leaf nodes are labeled by terminals, 
or by non-terminal nodes marked for substitution, except for exactly one non-terminal node, 
called the foot node, which can only be used to adjoin the tree to another nodeQ, 3) the foot 
node has the same label as the root node of the tree. 

X A null adjunction constraint (NA) is systematically put on the foot node of an auxiliary tree. This disallows 
adjunction of a tree onto the foot node itself. 
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Initial Tree: 



Auxiliary Tree: 



X 



X 





Figure 2.1: Elementary trees in TAG 



There are two operations defined in the TAG formalism, substitution^ and adjunction. In 
the substitution operation, the root node on an initial tree is merged into a non-terminal leaf 
node marked for substitution in another initial tree, producing a new tree. The root node and 



the substitution node must have the same name. Figure 2.2 shows two initial trees and the tree 
resulting from the substitution of one tree into the other. 




X 



=> 




Figure 2.2: Substitution in TAG 



In an adjunction operation, an auxiliary tree is grafted onto a non-terminal node anywhere 
in an initial tree. The root and foot nodes of the auxiliary tree must match the node at which 



the auxiliary tree adjoins. Figure 2.3 shows an auxiliary tree and an initial tree, and the tree 



resulting from an adjunction operation. 

A TAG G is a collection of finite initial trees, /, and auxiliary trees, A. The TREE SET of 
a TAG G, T(G) is defined to be the set of all derived trees starting from S-type initial trees 
in / whose frontier consists of terminal nodes (all substitution nodes having been filled). The 
string language generated by a TAG, C(G), is defined to be the set of all terminal strings 
on the frontier of the trees in T(G). 



technically, substitution is a specialized version of adjunction, but it is useful to make a distinction between 
the two. 



2.2. LEXICALIZATION 
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2.2 Lexicalization 



'Lexicalized' grammars systematically associate each elementary structure with a lexical anchor. 
This means that in each structure there is a lexical item that is realized. It does not mean simply 
adding feature structures (such as head) and unification equations to the rules of the formalism. 
These resultant elementary structures specify extended domains of locality (as compared to 
CFGs) over which constraints can be stated. 

Following [Schabes et al, 1985] we say that a grammar is lexicalized if it consists of 1) 
a finite set of structures each associated with a lexical item, and 2) an operation or operations 
for composing the structures. Each lexical item will be called the anchor of the corresponding 
structure, which defines the domain of locality over which constraints are specified. Note then, 
that constraints are local with respect to their anchor. 

Not every grammar is in a lexicalized form.[] In the process of lexicalizing a grammar, the 
lexicalized grammar is required to be strongly equivalent to the original grammar, i.e. it must 
produce not only the same language, but the same structures or tree set as well. 



NP 




NP I VP 



VP, 




NP 



P NPL 



John walked to Philadelphia 

(a) (b) (c) (d) 

Figure 2.4: Lexicalized Elementary trees 



3 Notice the simi larity of the definition of a lexicalized grammar with the off line parsability constraint ( [ Kaplan 
and Bresnan, 1983 ). As consequences of our definition, each structure has at least one lexical item (its anchor) 
attached to it and all sentences are finitely ambiguous. 
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In Figure 2.4, which shows sample initial and auxiliary trees, substitution sites are marked 
by a |, and foot nodes are marked by an *. This notation is standard and is followed in the 
rest of this report. 



2.3 Unification-based features 

In a unification framework, a feature structure is associated with each node in an elementary 
tree. This feature structure contains information about how the node interacts with other 
nodes in the tree. It consists of a top part, which generally contains information relating to the 
supernode, and a bottom part, which generally contains information relating to the subnode. 
Substitution nodes, however, have only the top features, since the tree substituting in logically 
carries the bottom features. 




Figure 2.5: Substitution in FB-LTAG 



The notions of substitution and adjunction must be augmented to fit within this new frame- 
work. The feature structure of a new node created by substitution inherits the union of the 
features of the original nodes. The top feature of the new node is the union of the top features 
of the two original nodes, while the bottom feature of the new node is simply the bottom feature 
of the top node of the substituting tree (since the substitution node has no bottom feature). 
Figure 2.5I 4 shows this more clearly. 



Adjunction is only slightly more complicated. The node being adjoined into splits, and its 
top feature unifies with the top feature of the root adjoining node, while its bottom feature uni- 
fies with the bottom feature of the foot adjoining node. Again, this is easier shown graphically, 



as in Figure 2.£ 



The embedding of the TAG formalism in a unification framework allows us to dynamically 
specify local constraints that would have otherwise had to have been made statically within 
the trees. Constraints that verbs make on their complements, for instance, can be implemented 
through the feature structures. The notions of Obligatory and Selective Adjunction, crucial 



4 abbreviations in the figure: t=top feature structure, tr=top feature structure of the root, br=bottom feature 
structure of the root, U=unification 

5 abbreviations in the figure: t=top feature structure, b=bottom feature structure, tr=top feature structure 
of the root, br=bottom feature structure of the root, tf=top feature structure of the foot, bf=bottom feature 
structure of the foot, U=unification 



2.3. UNIFICATION-BASED FEATURES 
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Figure 2.6: Adjunction in FB-LTAG 



to the formation of lexicalized grammars, can also be handled through the use of features.^ 
Perhaps more important to developing a grammar, though, is that the trees can serve as a 
schemata to be instantiated with lexical-specific features when an anchor is associated with the 
tree. To illustrate this, Figure 2.7 shows the same tree lexicalized with two different verbs, each 
of which instantiates the features of the tree according to its lexical selectional restrictions. 

In Figure 2.7 , the lexical item thinks takes an indicative sentential complement, as in the 
sentence John thinks that Mary loves Sally. Want takes a sentential complement as well, but an 
infinitive one, as in John wants to love Mary. This distinction is easily captured in the features 
and passed to other nodes to constrain which trees this tree can adjoin into, both cutting down 
the number of separate trees needed and enforcing conceptual Selective Adjunctions (SA). 



The remaining constraint, Null Adjunction (NA), must still be specified directly on a node. 
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S,- assign-eomp : inf_nil/ind_nill 
displ-eonst : [ se tl : -] I 
tense : <1> [] 
assign-ease : <2> [ ] 
agr : <3> [] 
assign-eomp : <4> [] 
mode : <5> [] 

comp : nil 

displ-eonst : [setl : <6> []] 
wh : <7> - 
extracted : - 



\ assign-eomp : inf_nil/ind_nill 
displ-eonst : [ se tl : -] 
tense : <1> [] 
assign-ease : <2> [] 
agr : <3> [] 
assign-eomp : <4> [ ] 
mode : <5> [ ] 

eomp : nil 

displ-eonst : [setl : <6> [ ]] 
wh : <7> - 
extraeted : - 



NPol ease : <2> VP 
agr : <3> 
wh : <7> 



assign-ease : <2> 
agr : <3> 
tense : <1> 
assign-eomp : <4> 
mode : <5> 

displ-eonst : [setl : <6>] 
mainv : <8> [ ] 
tense : <9> [ ] 
mode : <10> [] 
assign-eomp : <11> [] 
assign-case : <12> [ ] 
agr : <13> [] 
passive : <14> - 
displ-eonst : [setl : - 



NP^ ease : <2> 
agr : <3> 
wh : <7> 




assign-ease : <2> 
agr : <3> 
tense : <1> 
assign-eomp : <4> 
mode : <5> 

displ-eonst : [setl : <6>] 
mainv : <8> [ ] 
tense : <9> [ ] 
mode : <10> [ ] 
assign-eomp : <11> [] 
assign-case : <12> [] 
agr : <13> [] 
passive : <14> - 
displ-eonst : [setl : -1 



agr : 



3rdsing : + 
num : sing 
pers : 3 



inf_nil/ind_nil 




mainv : <8> 
tense : <9> 
mode : <10> 
assign-eomp : <11> 
assign-ease : <12> 
agr : <13> 
passive : <14> 
mode : ind 
tense : pres 
mainv : - 

assign-eomp : in d_nil/th a t/r el/if/whether 
assign-ease : nom 



V mainv : <8> 
tense : <9> 
mode : <10> 
assign-eomp : <11> 
assign-ease : <12> 
agr : <13> 
passive : <14> 
mode : ind 
tense : pres 
mainv : - 

assign-eomp : ind_nil/that/r el/if /whether 
assign-ease : nom 
agr : [3rdsing 



num : 
pers : 



displ-eonst : [setl : -] 
assign-eomp : inf_nil/ind_nil 



think tree want tree 

Figure 2.7: Lexicalized Elementary Trees with Features 



Chapter 3 

Overview of the XTAG System 



This section focuses on the various components that comprise the parser and English grammar 
in the XTAG system. Persons interested only in the linguistic analyses in the grammar may 
skip this section without loss of continuity, although a quick glance at the tagset used in XTAG 
and the set of non-terminal labels used will be useful. We may occasionally refer back to the 
various components mentioned in this section. 



3.1 System Description 



Figure 3.1 shows the overall flow of the system when parsing a sentence; a summary of each 
component is presented in Table 3.1. At the heart of the system is a parser for lexicalized 



TAGs ([ pchabes and Joshi, 1988| ; Schabes, 1990[ ) which produces all legitimate parses for the 



sentence. The parser has two phases: Tree Selection and Tree Grafting. 



3.1.1 Tree Selection 

Since we are working with lexicalized TAGs, each word in the sentence selects at least one 
tree. The advantage of a lexicalized formalism like LTAGs is that rather than parsing with all 
the trees in the grammar, we can parse with only the trees selected by the words in the input 
sentence. 

In the XTAG system, the selection of trees by the words is done in several steps. Each 
step attempts to reduce ambiguity, i.e. reduce the number of trees selected by the words in the 
sentence. 



Morphological Analysis and POS Tagging The input sentence is first submitted to the 
Morphological Analyzer and the Tagger. The morphological analyzer ([Karp et al.\ 



1992 |) consists of a disk-based database (a compiled version of the derivational rules) 



which is used to map an inflected word into its stem, part of speech and feature equations 
corresponding to inflectional information. These features are inserted at the anchor node 
of the tree eventually selected by the stem. The POS Tagger can be disabled in which 
case only information from the morphological analyzer is used. The morphology data 
was originally extracted from the Collins English Dictionary ([ [Hanks, 1979fl ) and Oxford 
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Input Sentence 

t 




Derivation Structure 
Figure 3.1: Overview of XTAG system 



Advanced Learner's Dictionary ([ [Hornby, 1974 ]) available through ACL-DCI ( [|Liberman, 
1989| ), and then cleaned up and augmented by hand (| Karp et al., 199^1 ). 



POS Blender The output from the morphological analyzer and the POS tagger go into the 
POS Blender which uses the output of the POS tagger as a filter on the output of the 
morphological analyzer. Any words that are not found in the morphological database are 
assigned the POS given by the tagger. 

Syntactic Database The syntactic database contains the mapping between particular stem(s) 
and the tree templates or tree-families stored in the Tree Database (see Table |3TT| ) . The 
syntactic database also contains a list of feature equations that capture lexical idiosyn- 
crasies. The output of the POS Blender is used to search the Syntactic Database to 
produce a set of lexicalized trees with the feature equations associated with the word(s) 
in the syntactic database unified with the feature equations associated with the trees. 
Note that the features in the syntactic database can be assigned to any node in the tree 
and not just to the anchor node. The syntactic database entries were originally extracted 
from the Oxford Advanced Learner's Dictionary ([[Hornby 1974] ]) and Oxford Dictionary 
for Contemporary Idiomatic English ( flCowie and Mackin, 197*5jj ) available through ACL- 
DCI ([ Liberman, 1989f| ) , and then modified and augmented by hand ( |Egcdi and Martin 



1994]). There are more than 31,000 syntactic database entries.^ Selected entries from 



this database are shown in Table 3.2 



Default Assignment For words that are not found in the syntactic database, default trees 
and tree-families are assigned based on their POS tag. 



Filters Some of the lexicalized trees chosen in previous stages can be eliminated in order to 
1 This number does not include trees assigned by default based on the part-of-speech of the word. 
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Component 


Details 


Morphological 
Analyzer and 
Morph Database 


Consists of approximately 317,000 inflected items 
derived from over 90000 stems. 

Entries are indexed on the inflected form and return 
the root form, POS, and inflectional information. 


POS Tagger 
and Lex Prob 
Database 


Wall Street Journal-trained trigram tagger ( Church, 1988]) 
extended to output N-best POS sequences 
([3oong and Huang, 199Cf|). Decreases the time to parse 
a sentence by an average of 93%. 


Syntactic 
Database 


More than 30,000 entries. 

Each entry consists of: the uninflected form of the word, 
its POS, the list of trees or tree-families associated with 
the word, and a list of feature equations that capture 
lexical idiosyncrasies. 


Tree Database 


1094 trees, divided into 52 tree families and 218 individual 
trees. Tree families represent subcategorization frames; 
the trees in a tree family would be related to each other 
transformationally in a movement-based approach. 


X-Interface 


Menu-based facility for creating and modifying tree files. 
User controlled parser parameters: parser's start category, 
enable/disable/retry on failure for POS tagger. 
Storage/retrieval facilities for elementary and parsed trees. 
Graphical displays of tree and feature data structures. 
Hand combination of trees by adjunction or substitution 
for grammar development. 
Ability to manually assign POS tag 
and/or Supertag before parsing 



Table 3.1: System Summary 



reduce ambiguity. Two methods are currently used: structural niters which eliminate 
trees which have impossible spans over the input sentence and a statistical filter based on 
unigram probabilities of non-lexicalized trees (from a hand corrected set of approximately 
6000 parsed sentences). These methods speed the runtime by approximately 87%. 

Supertagging Before parsing, one can avail of an optional step of supertagging the sentence. 
This step uses statistical disambiguation to assign a unique elementary tree (or supertag) 
to each word in the sentence. These assignments can then be hand-corrected. These 
supertags are used as a filter on the tree assignments made so far. More information on 
supertagging can be found in ( |Srinivas7 "19971 ; ISrinivas, 1997b| ). 



3.1.2 Tree Database 

The Tree Database contains the tree templates that are lexicalized by following the various 
steps given above. The lexical items are inserted into distinguished nodes in the tree template 
called the anchor nodes. The part of speech of each word in the sentence corresponds to the 
label of the anchor node of the trees. Hence the tagset used by the POS Tagger corresponds 
exactly to the labels of the anchor nodes in the trees. The tagset used in the XTAG system is 
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«INDEX»porousness«ENTRY»porousness«POS»N 
«TREES»~BNXN "BN ~CNn 

«FEATURES»#N_card- #N_const- #N_decreas- #N_def inite- #N_gen- 
#N_quan- #N_refl- 

«INDEX»coo«ENTRY»coo«POS»V«FAMILY»TnxOV 

«INDEX»engross«ENTRY»engross«PDS»V«FAMILY»TnxOVnxl 
«FEATURES»#TRANS+ 

«INDEX»forbear«ENTRY»forbear«PDS»V«FAMILY»TnxOVsl 
«FEATURES»#S1_WH- #Sl_inf _f or_nil 

«INDEX»have«ENTRY»have«POS»V«ENTRY»out«POS»PL 
«FAMILY»TnxOVplnxl 



Table 3.2: Example Syntactic Database Entries. 



given in Table 3.3. The tree templates are subdivided into tree families (for verbs and other 
predicates), and tree files which are simply collections of trees for lexical items like prepositions, 
determiners, etcfl. 



3.1.3 Tree Grafting 

Once a particular set of lexicalized trees for the sentence have been selected, XTAG uses an 
Ear ley-style predictive left-to-right parsing algorithm for LTAGs ( |Schabes and Joshi, 1988 



Schabes, 1990 |) to find all derivations for the sentence. The derivation trees and the associated 
derived trees can be viewed using the X-interface (see Table |3.1| ). The X- interface can also be 
used to save particular derivations to disk. 



The output of the parser for the sentence / had a map yesterday is illustrated in Figure 3.2. 
The parse tree[] represents the surface constituent structure, while the derivation tree represents 
the derivation history of the parse. The nodes of the derivation tree are the tree names anchored 
by the lexical items^]. The composition operation is indicated by the nature of the arcs: a dashed 
line is used for substitution and a bold line for adjunction. The number beside each tree name 
is the address of the node at which the operation took place. The derivation tree can also be 
interpreted as a dependency graph with unlabeled arcs between words of the sentence. 



2 The nonterminals in the tree database are A , AP, Ad, AdvP, Comp, Conj , D, N, NP, P, PP, Punct, S, 
V, VP. 

3 The feature structures associated with each note of the parse tree are not shown here. 
4 Appendix O explains the conventions used in naming the trees. 



3.1. SYSTEM DESCRIPTION 



15 



NP 




anxOVnxl[had] 



aNXN[I] (1) pvxPnx[on] (2) ccNXN[map] (2.2) 



had 



aNXN[desk] (2.2) 



desk 

pDnx[my](0) 

Derivation Tree 
Figure 3.2: Output Structures from the Parser 



pDnx[a] (0) 



map 

Parse Tree 



Part of Speech 


Description 


A 


Adjective 


Ad 


Adverb 


Comp 


Complementizer 


D 


Determiner 


G 


Genitive Noun 


I 


Interjection 


N 


Noun 


P 


Preposition 


PL 


Particle 


Punct 


Punctuation 


V 


Verb 



Table 3.3: XTAG tagset 



3.1.4 The Grammar Development Environment 

Working with and developing a large grammar is a challenging process, and the importance of 
having good visualization tools cannot be over-emphasized. Currently the XTAG system has 
X-windows based tools for viewing and updating the morphological and syntactic databases 
( l|Karp et al, 1992j ; [Egedi and Martin, 1994| ). These are available in both ASCII and binary- 
encoded database format. The ASCII format is well-suited for various UNIX utilities (awk, 
sed, grep) while the database format is used for fast access during program execution. However 
even the ASCII formatted representation is not well-suited for human readability. An X- 
windows interface for the databases allows users to easily examine them. Searching for specific 
information on certain fields of the syntactic database is also available. Also, the interface allows 
a user to insert, delete and update any information in the databases. Figure EOl(a) shows the 
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interface for the morphology database and Figure |3.3| (b) shows the interface for the syntactic 
database. 



Look-up 



Key: |acqui recF 



j Key : company 

(Entries: company N 3sg 

I company V INF 



j Key : bei ng 

■Entries: being N 3sg 
! be V PROG 



I Key : acqiri red 
(Entries: acquire 
1 acqui re 



V PPART WK 

V PAST WK 



File|0ptions [Search [Modi fy| Add | Delete] | Clear] 



Index: [coipany 
Entry: [coipany 
PCS: N 
Families: 



] |Part of Speech List] 



Tree 



TnxOdxNl 
TsOdxNl 

TnvflMl 

Features: ItN.refl- 
9i wh- 



1 Add Family to List] 
Delete Family from Li st] 



Add Feature to List] 
Delete Feature from Listl 



Examples: 



Add Example to List| 
Delete Example from List] 



Record f 1 of 2 |Next||Previous| 

HiBM 

company 



(a) Morphology database (b) Syntactic database 

Figure 3.3: Interfaces to the database maintenance tools 



XTAG also has a parsing and grammar development interface ([ Paroubek et al, 1992 1 ) . This 
interface includes a tree editor, the ability to vary parameters in the parser, work with multiple 
grammars and/or parsers, and use metarules for more efficient tree editing and construction 
( fBcckcr, 1994[| ). The interface is shown in Figure 3A. It has the following features: 



• Menu-based facility for creating and modifying tree files and loading grammar files. 

• User controlled parser parameters, including the root category (main S, embedded S, NP, 
etc.), and the use of the tagger (on/off /retry on failure). 

• Storage/retrieval facilities for elementary and parsed trees. 

• The production of postscript files corresponding to elementary and parsed trees. 

• Graphical displays of tree and feature data structures, including a scroll 'web' for large 
tree structures. 

• Mouse-based tree editor for creating and modifying trees and feature structures. 

• Hand combination of trees by adjunction or substitution for use in diagnosing grammar 
problems. 

• Metarule tool for automatic aid to the generation of trees by using tree-based transfor- 
mation rules 
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Figure 3.4: Interface to the XTAG system 



3.2 Computer Platform 

XTAG was developed on the Sun SPARC station series. It has been tested on various Sun 
platforms including Ultra-1, Ultra-Enterprise. XTAG is freely available from the XTAG web 
page at http : //www . cis . upenn . edu/~xtag/. It requires 75 MB of disk space (once all binaries 
and databases are created after the install). XTAG requires the following software to run: 

• A machine running UNIX and X11R4 (or higher). Previous releases of X will not work. 
X11R4 is free software which usually comes bundled with your OS. It is also freely available 
for various platforms at http : //www . xf ree86 . org/ 

• A Common Lisp compiler which supports the latest definition of Common Lisp (Steele's 
Common Lisp, second edition). XTAG has been tested on Lucid Common Lisp/SPARC 
Solaris, Version: 4.2.1. Allegro CL is no longer directly supported, however there have 
been third party ports to recent versions of Allegro CL. 

• CLX version 4 or higher. CLX is the Lisp equivalent to the Xlib package written in C. 

• Mark Kantrowitz's Lisp Utilities from CMU: logical-pathnames and defsystem. 

A patched version of CLX (Version 5.02) for SunOS 5.5.1 and the CMU Lisp Utilities are 
provided in our ftp directory for your convenience. However, we ask that you refer to the 
appropriate sources for updates. 

The morphology database component ( |Karp et al, 1992|| ), no longer under licensing re- 
strictions, is available as a separate download from the XTAG web page (see above for URL). 

The syntactic database component is also available as part of the XTAG system ( |Egedi| 
and Martin, 1994| ). 
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More information can be obtained on the XTAG web page at 
http : / / www . cis . upenn . edu/~xtag/. 



Chapter 4 

Underview 



The morphology, syntactic, and tree databases together comprise the English grammar. A 
lexical item that is not in the databases receives a default tree selection and features for its 
part of speech and morphology. In designing the grammar, a decision was made early on to 
err on the side of acceptance whenever there are conflicting opinions as to whether or not a 
construction is grammatical. In this sense, the XTAG English grammar is intended to function 
primarily as an acceptor rather than a generator of English sentences. The range of syntactic 
phenomena that can be handled is large and includes auxiliaries (including inversion), cop- 
ula, raising and small clause constructions, topicalization, relative clauses, infinitives, gerunds, 
passives, adjuncts, it-clefts, wh-clefts, PRO constructions, noun-noun modifications, extrapo- 
sition, determiner sequences, genitives, negation, noun-verb contractions, clausal adjuncts and 
imperatives. 

4.1 Subcategorization Frames 

Elementary trees for non-auxiliary verbs are used to represent the linguistic notion of subcate- 
gorization frames. The anchor of the elementary tree subcategorizes for the other elements that 
appear in the tree, forming a clausal or sentential structure. Tree families group together trees 
belonging to the same subcategorization frame. Consider the following uses of the verb buy: 

(1) Srini bought a book. 

(2) Srini bought Beth a book. 

In sentence (1), the verb buy subcategorizes for a direct object NP. The elementary tree 
anchored by buy is shown in Figure |4. l| (a) and includes nodes for the NP complement of buy and 
for the NP subject. In addition to this declarative tree structure, the tree family also contains 
the trees that would be related to each other transformationally in a movement based approach, 
i.e passivization, imperatives, wh-questions, relative clauses, and so forth. Sentence (2) shows 
that buy also subcategorizes for a double NP object. This means that buy also selects the 
double NP object subcategorization frame, or tree family, with its own set of transformationally 
related sentence structures. Figure fO|(b) shows the declarative structure for this set of sentence 
structures. 
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NP i 




NP;i 



NP 4- 




NP/i 



NP?4- 



bought bought 

(a) (b) 
Figure 4.1: Different subcategorization frames for the verb buy 



4.2 Complements and Adjuncts 

Complements and adjuncts have very different structures in the XTAG grammar. Complements 
are included in the elementary tree anchored by the verb that selects them, while adjuncts do 
not originate in the same elementary tree as the verb anchoring the sentence, but are instead 
added to a structure by adjunction. The contrasts between complements and adjuncts have been 
extensively discussed in the linguistics literature and the classification of a given element as one 
or the other remains a matter of debate (see [ Rizzi, 1990f] , | Larson, 1988(| , [ Jackendoff, 1990 1, 
[Larson, 199C], [Cinque, 1990 1, [Obernauer, 1984|, [Lasnik and Saito, 1984], and [phomskyT 



19861 ). The guiding rule used in developing the XTAG grammar is whether or not the sentence 



is ungrammatical without the questioned structure.^ Consider the following sentences: 



(3) Srini bought a book. 



(4) Srini bought a book at the bookstore. 



(5) Srini arranged for a ride. 



(6) *Srini arranged. 

Prepositional phrases frequently occur as adjuncts, and when they are used as adjuncts 
they have a tree structure such as that shown in Figure |4.2| (a). This adjunction tree would 
adjoin into the tree shown in Figure f4.1| (a) to generate sentence (4). There are verbs, however, 
such as arrange, hunger and differentiate, that take prepositional phrases as complements. 
Sentences (5) and (6) clearly show that the prepositional phrase are not optional for arrange. 
For these sentences, the prepositional phrase will be an initial tree (as shown in Figure |4.2| (b)) 
that substitutes into an elementary tree, such as the one anchored by the verb arrange in 
Figure |42](c). 

iteration of a structure can also be used as a diagnostic: Srini bought a book at the bookstore on Walnut 
Street for a friend. 
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VP* 





NPoJ- 




NPi 



PP i 



at for arranged 

(a) (b) (c) 

Figure 4.2: Trees illustrating the difference between Complements and Adjuncts 



Virtually all parts of speech, except for main verbs, function as both complements and 
adjuncts in the grammar. More information is available in this report on various parts of 
speech as complements: adjectives (e.g. section |B.13| ), nouns (e.g. section |fT2|) , and prepositions 
(e.g. section 3.10 ); and as adjuncts: adjectives (section 19.1 ), adverbs (section 19.5 ), nouns 
(section |19.2 ), and prepositions (section 19.4]) . 



4.3 Non-S constituents 

Although sentential trees are generally considered to be special cases in any grammar, insofar 
as they make up a 'starting category', it is the case that any initial tree constitutes a phrasal 
constituent. These initial trees may have substitution nodes that need to be filled (by other 
initial trees), and may be modified by adjunct trees, exactly as the trees rooted in S. Although 
grouping is possible according to the heads or anchors of these trees, we have not found any 
classification similar to the subcategorization frames for verbs that can be used by a lexical 
entry to 'group select' a set of trees. These trees are selected one by one by each lexical item, 
according to each lexical item's idiosyncrasies. The grammar described by this technical report 
places them into several files for ease of use, but these files do not constitute tree families in 
the way that the subcategorization frames do. 



4.4 Case Assignment 

4.4.1 Approaches to Case 
4.4.1.1 Case in GB theory 

GB (Government and Binding) theory proposes the following 'case filter' as a requirement on 
S-structure.0 



Case Filter Every overt NP must be assigned abstract case. [ Baegeman, 1991 ] 



2 There are certain problems with applying the case filter as a requirement at the level of S-structure. These 
issues are not crucial to the discussion o f the English XTAG impleme ntation of case and so will not be discussed 



here. Interested readers are referred to [Lasnik and Uriagereka, 1986] 
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Abstract case is taken to be universal. Languages with rich morphological case marking, 
such as Latin, and languages with very limited morphological case marking, like English, are all 
presumed to have full systems of abstract case that differ only in the extent of morphological 
realization. 

In GB, abstract case is argued to be assigned to NP's by various case assigners, namely 
verbs, prepositions, and INFL. Verbs and prepositions are said to assign accusative case to NP's 
that they govern, and INFL assigns nominative case to NP's that it governs. These governing 
categories are constrained as to where they can assign case by means of 'barriers' based on 
'minimality conditions', although these are relaxed in 'exceptional case marking' situations. 
The details of the GB analysis are beyond the scope of this technical report, but see [phomsky. 



1986 1 for the original analysis or [Haegeman, 1991] for an overview. Let it suffice for us to say 
that the notion of abstract case and the case filter are useful in accounting for a number of 
phenomena including the distribution of nominative and accusative case, and the distribution 
of overt NP's and empty categories (such as PRO). 



4.4.1.2 Minimalism and Case 

A major conceptual difference between GB theories and Minimalism is that in Minimalism, 
lexical items carry their features with them rather than being assigned their features based on 
the nodes that they end up at. For nouns, this means that they carry case with them, and that 
their case is 'checked' when they are in SPEC position of AGR S or AGR D , which subsequently 
disappears flChomsky, 1992 . 



4.4.2 Case in XTAG 

The English XTAG grammar adopts the notion of case and the case filter for many of the same 
reasons argued in the GB literature. However, in some respects the English XTAG grammar's 
implementation of case more closely resembles the treatment in Chomsky's Minimalism frame- 



work | Chomsky, 1992 1 than the system outlined in the GB literature | Chomsky, 1986 1. As in 
Minimalism, nouns in the XTAG grammar carry case with them, which is eventually 'checked'. 
However in the XTAG grammar, noun cases are checked against the case values assigned by 
the verb during the unification of the feature structures. Unlike Chomsky's Minimalism, there 
are no separate AGR nodes; the case checking comes from the verbs directly. Case assignment 
from the verb is more like the GB approach than the requirement of a SPEC-head relationship 
in Minimalism. 

Most nouns in English do not have separate forms for nominative and accusative case, and 
so they are ambiguous between the two. Pronouns, of course, are morphologically marked for 
case, and each carries the appropriate case in its feature. Figures |4.3| (a) and ^^(b) show the NP 
tree anchored by a noun and a pronoun, respectively, along with the feature values associated 
with each word. Note that books simply gets the default case nom/acc, while she restricts the 
case to be nom. 



4.4.3 Case Assigners 
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(a) 



(b) 



Figure 4.3: Lexicalized NP trees with case markings 



4.4.3.1 Prepositions 

Case is assigned in the XTAG English grammar by two lexical categories - verbs and preposi- 
tions.0 Prepositions assign accusative case (acc) through their <assign-case> feature, which 
is linked directly to the <case> feature of their objects. Figure |4.4j (a) shows a lexicalized 
preposition tree, while Figure |4.4| (b) shows the same tree with the NP tree from Figure |4.3| (a) 
substituted into the NP position. Figure f4.4| (c) is the tree in Figure |4.4j (b) after unification has 
taken place. Note that the case ambiguity of books has been resolved to accusative case. 



4.4.3.2 Verbs 

Verbs are the other part of speech in the XTAG grammar that can assign case. Because XTAG 
does not distinguish INFL and VP nodes, verbs must provide case assignment on the subject 
position in addition to the case assigned to their NP complements. 

Assigning case to NP complements is handled by building the case values of the complements 
directly into the tree that the case assigner (the verb) anchors. Figures |4.5| (a) and [|]^(b) show 
an S tree[] that would be anchored^ by a transitive and ditransitive verb, respectively. Note 
that the case assignments for the NP complements are already in the tree, even though there 
is not yet a lexical item anchoring the tree. Since every verb that selects these trees (and other 
trees in each respective subcategorization frame) assigns the same case to the complements, 



A For also assigns case as a complementizer. See section |8.5| for more details. 

4 Features not pertaining to this discussion have been taken out to improve readability and to make the trees 
easier to fit onto the page. 

The diamond marker (o) indicates the anchor(s) of a structure if the tree has not yet been lexicalized. 
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Figure 4.4: Assigning case in prepositional phrases 



(c) 



building case features into the tree has exactly the same result as putting the case feature value 
in each verb's lexical entry. 



assign-case : <3> 
[agr : <4> | 




NPol [wh : - 

case : <3> 
agr : <4> 



VP [assign-case : <3> [ 1 
[agr:<4>[] 
[assign-case 
[agr : <2> 



: <1> 



S,-[] 

assign-case : <3> 
[agr : <4> 



NPoi [wh : - 

case : <3> 
agr : <4> 




VP [assign-case : <3> [ 1 
[agr : <4>[] J 
[assign-case : <1>1 
[agr : <2> 



VO [assign-case : <1> _\ NP/1 |case : acc| 
[agr:<2>[] 



VO Tassign-case : <1> [1 NP/i |case : acc] NP 2 J- case ; acc] 
|agr:<2>[] J 

[] 



(a) (b) 
Figure 4.5: Case assignment to NP arguments 



The case assigned to the subject position varies with verb form. Since the XTAG grammar 
treats the inflected verb as a single unit rather than dividing it into INFL and V nodes, case, 
along with tense and agreement, is expressed in the features of verbs, and must be passed in the 
appropriate manner. The trees in Figure ^1] show the path of linkages that joins the <assign- 
case> feature of the V to the <case> feature of the subject NP. The morphological form of 
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the verb determines the value of the <assign-case> feature. Figures [0](a) and 4.6(b) show 
the same treef] anchored by different morphological forms of the verb sing, which give different 
values for the <assign-case> feature. 
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(a) (b) 
Figure 4.6: Assigning case according to verb form 

The adjunction of an auxiliary verb onto the VP node breaks the <assign-case> link from 
the main V, replacing it with a link from the auxiliary verb instead.[| The progressive form of 
the verb in Figure |4.6| (b) has the feature-value <assign-case>=none, but this is overridden 
by the adjunction of the appropriate form of the auxiliary word be. Figure fh?](a) shows the 



lexicalized auxiliary tree, while Figure 4.7(b) shows it adjoined into the transitive tree shown 
in Figure |4.6|(b). The case value passed to the subject NP is now nom (nominative). 



4.4.4 PRO in a unification based framework 

Tensed forms of a verb assign nominative case, and untensed forms assign case none, as the 
progressive form of the verb sing does in Figure |4.6| (b) . This is different than assigning no case 
at all, as one form of the infinitive marker to does. See Section 8J5 for more discussion of this 
special case.) The distinction of a case none from no case is indicative of a divergence from the 
standard GB theory. In GB theory, the absence of case on an NP means that only PRO can 
fill that NP. With feature unification as is used in the FB-LTAG grammar, the absence of case 
on an NP means that any NP can fill it, regardless of its case. This is due to the mechanism 
of unification, in which if something is unspecified, it can unify with anything. Thus we have 
a specific case none to handle verb forms that in GB theory do not assign case. PRO is the 



"Again, the featu re structures shown have been restricted to those that pertain to the V/NP interaction. 
7 See section 20.1 for a more complete explanation of how this relinking occurs. 
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Figure 4.7: Proper case assignment with auxiliary verbs 
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only NP with case none. Note that although we are drawn to this treatment by our use of 
unification for feature manipulation, our treatment is very similar to the assignment of null 
case to PRO in [Chomsky and Lasnik, 1993 1 . [Watanabe, 1993| also proposes a very similar 
approach within Chomsky's Minimalist framework.^ 



See Sections S.l 



and 



for additional discussion of PRO. 



Part II 

Verb Classes 
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Where to Find What 



The two page table that follows gives an overview of what types of trees occur in various tree 
families with pointers to discussion in this report. An entry in a cell of the table indicates that 
the tree(s) for the construction named in the row header are included in the tree family named 
in the column header. Entries are of two types. If the particular tree(s) are displayed and/or 
discussed in this report the entry gives a page number reference to the relevant discussion or 
figure.]] Otherwise, a \J indicates inclusion in the tree family but no figure or discussion related 
specifically to that tree in this report. Blank cells indicate that there are no trees for the 
construction named in the row header in the tree family named in the column header. Two 
tables are given below. The first one gives the expansion of abbreviations in the table headers. 
The second table gives the name given to each tree family in the actual XTAG grammar. This 
makes it easier to find the description of each tree family in Chapter || and to compare the 
description with the online XTAG grammar. 



Since Chapter |6| has a brief discussion and a declarative tree for every tree family, page references are given 
only for other sections in which discussion or tree diagrams appear. 
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Abbreviation 



Full Name 



Sent. Subj. w. to 

Prcd. Mult-wd. ARB, P 

Pred. Mult-wd. A, P 

Pred. Mult-wd. N, P 

Pred. Mult-wd. P, P 

Pred. Mult-wd. no int. mod. 

Pred. Sent. Subj., ARB, P 

Pred. Sent. Subj., A, P 

Pred. Sent. Subj., Conj, P 

Pred. Sent. Subj., N, P 

Pred. Sent. Subj., P, P 

Pred. Sent. Subj., no int-mod 

Pred. Locative 

Pred. A Sent. Subj., Comp. 

Sentential Comp. with NP 

Pred. Mult wd. V, P 

Adj. Sm. CI. w. Sentential Subj. 

NP Sm. Clause w. Sentential Subj. 

PP Sm. Clause w. Sentential Subj. 

NP Sm. CI. w. Sent. Comp. 

Adj. Sm. CI. w. Sent. Comp. 

Exhaustive PP Sm. CI. 

Ditrans. Light Verbs w. PP Shift 

Ditrans. Light Verbs w/o PP Shift 

Y/N question 

Wh-mov. NP complement 

Wh-mov. S comp. 

Wh-mov. Adj comp. 

Wh-mov. object of a P 

Wh-mov. PP 

Topic. NP complement 

Det. gerund 

Rel. cl. on NP comp. 

Rel. cl. on PP comp. 

Rel. cl. on NP object of P 

Pass, with wh- moved subj. 

Pass. w. wh-mov. ind. obj. 

Pass. w. wh-mov. obj. of the by phrase 

Pass. w. wh-mov. by phrase 

Trans. Idiom with V, D and N 

Idiom with V, D, N 

Idiom with V, D, A, N 

Idiom with V, N 

Idiom with V, A, N 

Idiom with V, D, N, P 

Idiom with V, D, A, N, P 

Idiom with V, N, P 

Idiom with V, A, N, P 



Sentential Subject with to PP complement 

Predicative Multi-word PP with Adv, Prep anchors 

Predicative Multi-word PP with Adj, Prep anchors 

Predicative Multi-word PP with Noun, Prep anchors 

Predicative Multi-word PP with two Prep anchors 

Predicative Multi-word PP with no internal modification 

Predicative PP with Sentential Subject, and Adv, Prep anchors 

Predicative PP with Sentential Subject, and Adj, Prep anchors 

Predicative PP with Sentential Subject, and Conj, Prep anchors 

Predicative PP with Sentential Subject, and Noun, Prep anchors 

Predicative PP with Sentential Subject, and two Prep anchors 

Predicative PP with Sentential Subject, no internal modification 

Predicative anchored by a Locative Adverb 

Predicative Adjective with Sentential Subject and Complement 

Sentential Complement with NP 

Predicative Multi-word with Verb, Prep anchors 

Adjective Small Clause with Sentential Subject 

NP Small Clause with Sentential Subject 

PP Small Clause with Sentential Subject 

NP Small Clause with Sentential Complement 

Adjective Small Clause with Sentential Complement 

Exhaustive PP Small Clause 

Ditransitive Light Verbs with PP Shift 

Ditransitive Light Verbs without PP Shift 

Yes/No question 

Wh-moved NP complement 

Wh-moved S complement 

Wh-moved Adjective complement 

Wh-moved object of a P 

Wh-moved PP 

Topicalizcd NP complement 

Determiner gerund 

Relative clause on NP complement 

Relative clause on PP complement 

Relative clause on NP object of P 

Passive with wh-moved subject (with and without by phrase) 

Passive with wh-moved indirect object (with and without by phrase) 

Passive with wh-moved object of the by phrase 

Passive with wh-moved by phrase 

Transitive Idiom with Verb, Det and Noun anchors 

Idiom with V, D, and N anchors 

Idiom with V, D, A, and N anchors 

Idiom with V, and N anchor 

Idiom with V, A, and N anchors 

Idiom with V, D, N, and Prep anchors 

Idiom with V, D, A, N, and Prep anchors 

Idiom with V, N, and Prep anchors 

Idiom with V, A, N, and Prep anchors 
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Full Name 



XTAG Name 



Intransitive Sentential Subject 

Sentential Subject with 'to' complement 

PP Small Clause, with Adv and Prep anchors 

PP Small Clause, with Adj and Prep anchors 

PP Small Clause, with Noun and Prep anchors 

PP Small Clause, with Prep anchors 

PP Small Clause, with Prep and Noun anchors 

PP Small Clause with Sentential Subject, and Adv and Prep anchors 

PP Small Clause with Sentential Subject, and Adj and Prep anchors 

PP Small Clause with Sentential Subject, and Noun and Prep anchors 

PP Small Clause with Sentential Subject, and Prep anchors 

PP Small Clause with Sentential Subject, and Prep and Noun anchors 

Exceptional Case Marking 

Locative Small Clause with Ad anchor 

Predicative Adjective with Sentential Subject and Complement 
Transitive 

Ditransitive with PP shift 
Ditransitive 
Ditransitive with PP 
Sentential Complement with NP 
Intransitive Verb Particle 
Transitive Verb Particle 
Ditransitive Verb Particle 
Intransitive with PP 
Sentential Complement 
Light Verbs 

Ditransitive Light Verbs with PP Shift 
Adjective Small Clause with Sentential Subject 
NP Small Clause with Sentential Subject 
PP Small Clause with Sentential Subject 
Predicative Multi-word with Verb, Prep anchors 
Adverb It-Cleft 
NP It-Cleft 
PP It-Cleft 

Adjective Small Clause Tree 

Adjective Small Clause with Sentential Complement 
Equative BE 
NP Small Clause 

NP with Sentential Complement Small Clause 
PP Small Clause 
Exhaustive PP Small Clause 
Intransitive 

Intransitive with Adjective 

Transitive Sentential Subject 

Idiom with V, D and N 

Idiom with V, D, A, and N anchors 

Idiom with V and N anchors 

Idiom with V, A, and N anchors 

Idiom with V, D, N, and Prep anchors 

Idiom with V, D, A, N, and Prep anchors 

Idiom with V, N, and Prep anchors 

Idiom with V, A, N, and Prep anchors 



TsOV 

TsOVtonxl 

TnxOARBPnxl 

TnxOAPnxl 

TnxONPnxl 

TnxOPPnxl 

TnxOPNaPnxl 

TsOARBPnxl 

TsOAPnxl 

TsONPnxl 

TsOPPnxl 

TsOPNaPnxl 

TXnxOVsl 

TnxOnxlARB 

TsOAlsl 

TnxOVnxl 

TnxOVnxltonx2 

TnxOVnxlnx2 

TnxOVnxlpnx2 

TnxOVnxls2 

TnxOVpl 

TnxOVplnxl 

TnxOVplnxlnx2 

TnxOVpnxl 

TnxOVsl 

TnxOlVNl 

Tnx01VNlPnx2 

TsOAxl 

TsONl 

TsOPnxl 

TnxOVPnxl 

TItVadls2 

TItVnxls2 

TItVpnxls2 

TnxOAxl 

TnxOAlsl 

TnxOBEnxl 

TnxONl 

TnxONlsl 

TnxOPnxl 

TnxOPxl 

TnxOV 

TnxOVaxl 

TsOVnxl 

TnxOVDNl 

TnxOVDANl 

TnxOVNl 

TnxOVANl 

TnxOVDNlPnx2 

TnxOVDANlPnx2 

TnxOVNlPnx2 

TnxOVANlPnx2 
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Tree families 


Constructions 


Intransitive Sentential Subj 


Sent. Subj. w. to 


Pred. Mult-wd. ARB, P 


Pred. Mult-wd. A, P 


Pred. Mult-wd. N, P 


Pred. Mult-wd. P, P 


Pred. Mult-wd. no int. mod. 


Pred. Sent. Subj., ARB, P 


Pred. Sent. Subj., A, P 


Pred. Sent. Subj., N, P 


Pred. Sent. Subj., P, P 


Pred. Sent. Subj., no int-mod 


ECM 


Pred. Locative 


Pred. A Sent. Subj., Comp. 


Declarative 


V 


V 


V 


V 


V 


V 


V 


V 


V 


V 


V 


V 


y 


bi 

i_i 




Passive w/ <fe w/o phrase 


























Li 






A/" /AT x 

Y/JN quest. 
































Wh-moved subject 


J 

V 


V 


V 


J 

V 


V 


V 


J 

V 


J 

V 


J 

V 


J 

V 


J 

V 


V 


V 


j 

V 


J 

V 


Wn-mov. Nr complement, DO or 1(J 
































Wn-mov. b comp. 
































Wn-mov. Adj. or Adv. comp. 




























L 




TK71 l • i f* 1 ~i 

Wn-mov. object olar 






J 

V 


J 

V 


V 

V 


V 

V 


J 

V 


















TT71 T~»T~» 

Wn-mov. PP 






V 






V 


V 


















Topic. NP comp. 
































Imperative 






V 






V 


V 












V 


V 




Det. gerund 
































NP gerund 






V 








V 












V 


V 




Ergative 
































Kel. cl. on subj. w/ JNP 






V 


V 


V 


V 


V 












V 


V 




Rel. cl. on subj. w/ Comp 






V 


V 


V 


V 


V 












V 


V 




Rel. cl. on NP comp., DO, 10 w/ NP 
































Rel. cl. on NP comp., DO, 10 w/ 
Comp 
































Rel. cl. on PP comp. w/ pied-piping 












V 


V 


















Rel. cl. on NP object of P w/ NP 






V 


V 


V 


V 


V 


V 


V 


V 


V 


V 








Rel. cl. on NP object of P w/ Comp 






V 


V 




V 


V 


V 


V 


V 


V 


V 








Rel. cl. on adjunct w/ PP 


V 


V 


V 


V 


V 


V 


V 


V 


V 


V 


V 


V 


V 




V 


Rel. cl. on adjunct w/ Comp 


V 


V 


V 


V 


V 


V 


V 


V 


V 


V 


V 


V 


V 




V 


Pass. w. wh-mov. subj. 
































Pass. w. wh-mov. ind. obj. 
































Pass. w. wh-mov. obj. of by phrase 
































Pass. w. wh-mov. by phrase 
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Tree families 


Constructions 


Transitive 


Ditransitive with PP shift 


Ditransitive 


Ditransitive with PP 


Sentential Comp. with NP 


Intransitive Verb Particle 


Transitive Verb Particle 


Ditransitive Verb Particle 


Intransitive with PP 


Sentential Complement 


Trans. Light Vs 


Ditrans. Light Vs 


Adj. Sm. Cl. w. Sentential Subj. 


NP Sm. Cl. w. Sentential Subj. 


PP Sm. Cl. w. Sentential Subj. 


Pred. Mult. wd. V, P 


Declarative 


l — n — i 

y 


i i 

y 


i — n — i 

y 


V 


V 


V 


V 


V 


y 


l II 1 

10 87] 


V 


V 


V 


V 


V 


V 


Passive w/ & w/o by phrase 


V 


V 


V 


V 


— 1 1 

y 




V 


V 








V 








V 


V /AT ^n^r-i-i- 

i / IN quest . 


































Wh-moved subject 


V 


V 


V 


V 


V 


V 


V 


V 


V 


V 


V 




V 


V 


V 


V 


Wh-mov. NP complement, DO or 10 


— ^ — 

y 


V 


123 


V 


V 




V 


V 


















Wh-mov. S comp. 










V 










V 














Wh-mov. Adj. or Adv. comp. 


























V 








Wh-mov. object of a P 




V 
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V 






V 








V 


Wh-mov. PP 




V 
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V 










topic. JNP comp. 


V 


V 


V 




V 




V 


V 


















Imperative 


■ i 

144 






V 


V 


V 




V 


V 


V 


V 


V 








V 


Det. gerund 


14t] 


V 


V 


V 


V 


V 


V 


V 


V 


V 


V 


V 








V 


ATT") 1 

NP gerund 


i4g| 


V 


V 


V 


V 


V 


V 


V 


V 


V 


V 


V 








V 


Lrgative 


y 
































T"> 1 1 1 • / ATT") 

Kel. cl. on subj. w/ NP 


V 


V 


V 


V 


V 


V 


V 


V 


V 


V 


V 


V 








V 


Kel. cl. on subj. w/ Comp 


V 


V 


V 


V 


V 


V 


V 


V 


V 


V 


V 


V 








V 


T") 1 1 ATT") TA /"\ T f \ / ATT") 

Kei. cl. on NP comp., DO, IO w/ NP 


V 


V 


V 


V 


V 




V 


V 


V 






V 










l~\ l l ATT"* TA/~"\ T /" \ / 

Kei. ci. on NP comp., DO, IO w/ 
Comp 


V 


V 


V 


V 


V 




V 


V 


V 






V 










Rel. cl. on PP comp. w/ pied-piping 


V 


V 


V 


V 


V 




V 




V 






V 






V 




T> „1 ,,1 „~ ATTD ^U^ni ~f L) / AT TD 

Kei. cl. on l\r object ot r w/ IN_r 


V 


V 


V 


V 


V 




V 




V 






V 






V 


V 


Rel. cl. on NP object of P w/ Comp 


V 


V 


V 


V 


V 




V 




V 






V 






V 


V 


Rel. cl. on adjunct w/ PP 


V 


V 


V 


V 


V 


V 


V 


V 


V 


V 


V 


V 


V 


V 


V 


V 


Rel. cl. on adjunct w/ Comp 


V 


V 


V 


V 


V 


V 


V 


V 


V 


V 


V 


V 


V 


V 


V 


V 


Parenthetical quoting clause 










V 










V 














Past-participal as arg Adj 


V 






























V 


Past-participial NP pre-mod 


V 






























V 


Pass. w. wh-mov. subj. 


V 


V 


V 


V 


V 




V 


V 








V 








V 


Pass. w. wh-mov. ind. obj. 




V 


V 


V 


V 






V 








V 










Pass. w. wh-mov. obj. of by phrase 


V 


V 


V 


V 


V 




V 


V 








V 








V 


Pass. w. wh-mov. by phrase 


V 


V 


V 


V 


V 




V 


V 
















V 
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( ,r\Y~\ Q"t"T*i i p1"ioti t; 


Adverb It-Cleft 


NP It-Cleft 


PP It-Cleft 


Adj. Small Clause 


Adj. Sm. Cl. w. Sent. Comp. 


Equative BE 


NP Small Clause 


NP Sm. Cl. w. Sent. Comp. 


PP Small Clause 


Exhaustive PP Sm. Cl. 


Intransitive 


Intransitive with Adjective 


Transitive Sentential Subj 


Declarative 


J 

V 


J 

V 




^1 
I 


J 

V 




UK}] 


J 

V 


H 


J 

V 


J 

V 


J 

V 


92 


Passive w/ & w/o fry phrase 




























Y/N quest. 


,/ 

V 


,/ 

V 


ml 






/ 
V 
















Wh-moved subject 








ED 


V 




J 
V 


J 
V 


J 

V 


V 


□ 


-T-I- 

188 


./ 


Wh-mov. NP complement, DO or 10 




V 










V 












V 


Wh-mov. S comp. 




























Wh-mov. Adj. or Adv. comp. 


V 






V 




















Wh-mov. object of a P 


















./ 

V 










Wh-mov. PP 






,/ 
V 












V 










Topic. NP comp. 




V 










V 














Imperative 








V 


V 




V 


V 




v 7 








Det. gerund 




























NP gerund 








V 


V 




V 


V 




v 7 




V 




Ergative 




























Rel. cl. on subj. w/ NP 








V 


V 




V 


V 




v 7 


V 


V 




Rel. cl. on subj. w/ Comp 








V 


V 




V 


V 




V 


V 


V 




Rel. cl. on NP comp., DO, 10 w/ NP 




























Rel. cl. on NP comp., DO, 10 w/ 
Comp 




























Rel. cl. on PP comp. w/ pied-piping 




























Rel. cl. on NP object of P w/ NP 


















V 










Rel. cl. on NP object of P w/ Comp 


















V 










Rel. cl. on adjunct w/ PP 


V 


V 


V 


V 


V 




V 


V 


V 


V 


V 


V 




Rel. cl. on adjunct w/ Comp 


V 


V 


V 


V 


V 




V 


V 


V 


v 7 


V 


V 


V 


Participial NP pre-mod 






















V 






Pass. w. wh-mov. subj. 




























Pass. w. wh-mov. ind. obj. 




























Pass. w. wh-mov. obj. of by phrase 




























Pass. w. wh-mov. by phrase 
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1 — 1 


Declarative 
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v 7 


v 7 


v 7 


Passive w/ & w/o by phrase 


V 


V 
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v 7 
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v 7 


Y/N quest. 


















Wh-moved subject 
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V 
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v 7 


v 7 


v 7 
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Wh-mov. NP complement, DO or 10 


















Wh-mov. S comp. 


















Wh-mov. Adj. or Adv. comp. 


















Wh-mov. object of a P 


















Wh-mov. PP 


















Topic. NP comp. 


















Imperative 


V 


V 


V 


v 7 


v 7 


v 7 


v 7 


v 7 


Det. gerund 


















NP gerund 


V 


V 


V 


V 


v 7 


v 7 


v 7 


v 7 


Ergative 


















Rel. cl. on subj. w/ NP 


V 


V 


V 


V 


v 7 


v 7 


v 7 


v 7 


Rel. cl. on subj. w/ Comp 
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V 


V 


v 7 


v 7 


v 7 


v 7 


Rel. cl. on NP comp., DO, 10 w/ NP 


















Rel. cl. on NP comp., DO, 10 w/ 


















Comp 


















Rel. cl. on PP comp. w/ pied-piping 


















Rel. cl. on NP object of P w/ NP 


















Rel. cl. on NP object of P w/ Comp 


















Rel. cl. on adjunct w/ PP 
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Rel. cl. on adjunct w/ Comp 
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Pass. w. wh-mov. subj. 


















Pass. w. wh-mov. ind. obj. 


















Pass. w. wh-mov. obj. of by phrase 
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v 7 
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Pass. w. wh-mov. by phrase 
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Chapter 6 

Verb Classes 



Each mainQ verb in the syntactic lexicon selects at least one tree family^ (subcategorization 
frame). Since the tree database and syntactic lexicon are already separated for space efficiency 
(see Chapter |3|), each verb can efficiently select a large number of trees by specifying a tree 
family, as opposed to each of the individual trees. This approach allows for a considerable 
reduction in the number of trees that must be specified for any given verb or form of a verb. 

There are currently 52 tree families in the system.^] This chapter gives a brief description 
of each tree family and shows the corresponding declarative treeQ, along with any peculiar 
characteristics or trees. It also indicates which transformations are in each tree family, and 
gives the number of verbs that select that family.^ A few sample verbs are given, along with 
example sentences. 



6.1 Intransitive: TnxOV 



Description: This tree family is selected by verbs that do not require an object complement 
of any type. Adverbs, prepositional phrases and other adjuncts may adjoin on, but are 
not required for the sentences to be grammatical. 1,878 verbs select this family. 

Examples: eat, sleep, dance 
Al ate . 
Seth slept . 
Hyun danced . 



Declarative tree: See Figure ST 



Other available trees: wh-moved subject, subject relative clause with and without comp, 
adjunct (gap-less) relative clause with comp, adjunct (gap-less) relative clause with PP 
pied-piping, imperative, determiner gerund, NP gerund, pre-nominal participal. 



1 Auxiliary verbs are handled under a different mechanism. See Chapter ^ for details. 



2 See section 3.1.2 for explanation of tree families. 

3 An explanation of the naming convention used in naming the trees and tree families is available in Ap- 



pendix 

4 Before lexicalization, the o indicates the anchor of the tree. 



5 Numbers given are as of August 1998 and are subject to some change with further development of the 
grammar. 
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Figure 6.1: Declarative Intransitive Tree: anxOV 



6.2 Transitive: TnxOVnxl 



Description: This tree family is selected by verbs that require only an NP object complement. 
The NP's may be complex structures, including gerund NP's and NP's that take sentential 



complements. This does not include light verb constructions (see sections 3.15 and 6.16 ). 
4,343 verbs select the transitive tree family. 

Examples: eat, dance, take, like 
Al ate an apple . 
Seth danced the tango . 
Hyun is taking an algorithms course . 
Anoop likes the fact that the semester is finished . 



Declarative tree: See Figure 3.2 




Figure 6.2: Declarative Transitive Tree: anxOVnxl 



Other available trees: wh-moved subject, wh-moved object, subject relative clause with and 
without comp, adjunct (gap-less) relative clause with comp/with PP pied-piping, object 
relative clause with and without comp, imperative, determiner gerund, NP gerund, passive 
with by phrase, passive without by phrase, passive with wh-moved subject and by phrase, 
passive with wh-moved subject and no by phrase, passive with wh-moved object out of the 
by phrase, passive with wh-moved by phrase, passive with relative clause on subject and 
by phrase with and without comp, passive with relative clause on subject and no by phrase 
with and without comp, passive with relative clause on object on the by phrase with and 
without comp/with PP pied-piping, gerund passive with by phrase, gerund passive without 
by phrase, ergative, ergative with wh-moved subject, ergative with subject relative clause 
with and without comp, ergative with adjunct (gap-less) relative clause with comp/with 



6.3. DITRANSITIVE: TNX0VNX1NX2 
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PP pied-piping. In addition, two other trees that allow transitive verbs to function as 
adjectives (e.g. the stopped truck) are also in the family. 



6.3 Ditransitive: TnxOVnxlnx2 



Description: This tree family is selected by verbs that take exactly two NP complements. It 
does not include verbs that undergo the ditransitive verb shift (see section |6.5[ ). The 
apparent ditransitive alternates involving verbs in this class and benefactive PP's (e.g. 
John baked a cake for Mary) are analyzed as transitives (see section |6.2| ) with a PP 
adjunct. Benefactives are taken to be adjunct PP's because they are optional (e.g. John 
baked a cake vs. John baked a cake for Mary). 122 verbs select the ditransitive tree family. 



Examples: ask, cook, win 

Christy asked Mike a question . 
Doug cooked his father dinner . 
Dania won her sister a stuffed animal 



Declarative tree: See Figure 5.3 




Figure 6.3: Declarative Ditransitive Tree: anxOVnxlnx2 



Other available trees: wh-moved subject, wh-moved direct object, wh-moved indirect ob- 
ject, subject relative clause with and without comp, adjunct (gap-less) relative clause 
with comp/with PP pied-piping, direct object relative clause with and without comp, in- 
direct object relative clause with and without comp, imperative, determiner gerund, NP 
gerund, passive with by phrase, passive without by phrase, passive with wh-moved subject 
and by phrase, passive with wh-moved subject and no by phrase, passive with wh-moved 
object out of the by phrase, passive with wh-moved by phrase, passive with wh-moved 
indirect object and by phrase, passive with wh-moved indirect object and no by phrase, 
passive with relative clause on subject and by phrase with and without comp, passive with 
relative clause on subject and no by phrase with and without comp, passive with relative 
clause on object of the by phrase with and without comp/with PP pied-piping, passive 
with relative clause on the indirect object and by phrase with and without comp, passive 
with relative clause on the indirect object and no by phrase with and without comp, pas- 
sive with/without fry-phrase with adjunct (gap-less) relative clause with comp/with PP 
pied-piping, gerund passive with by phrase, gerund passive without by phrase. 
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6.4 Ditransitive with PP: TnxOVnxlpnx2 

Description: This tree family is selected by ditransitive verbs that take a noun phrase followed 
by a prepositional phrase. The preposition is not constrained in the syntactic lexicon. The 
preposition must be required and not optional - that is, the sentence must be ungram- 
matical with just the noun phrase (e.g. *John put the table). No verbs, therefore, should 



select both this tree family and the transitive tree family (see section 5.2). This tree 
family is also distinguished from the ditransitive verbs, such as give, that undergo verb 
shifting (see section [O]). There are 62 verbs that select this tree family. 

Examples: associate, put, refer 

Rostenkowski associated money with power . 

He put his reputation on the line . 

He referred all questions to his attorney . 



Declarative tree: See Figure 6.4 




Figure 6.4: Declarative Ditransitive with PP Tree: anxOVnxlpnx2 



Other available trees: wh-moved subject, wh-moved direct object, wh-moved object of PP, 
wh-moved PP, subject relative clause with and without comp, adjunct (gap-less) relative 
clause with comp/with PP pied-piping, direct object relative clause with and without 
comp, object of PP relative clause with and without comp/with PP pied-piping, imper- 
ative, determiner gerund, NP gerund, passive with by phrase, passive without by phrase, 
passive with wh-moved subject and by phrase, passive with wh-moved subject and no by 
phrase, passive with wh-moved object out of the by phrase, passive with wh-moved by 
phrase, passive with wh-moved object out of the PP and by phrase, passive with wh-moved 
object out of the PP and no by phrase, passive with wh-moved PP and by phrase, passive 
with wh-moved PP and no by phrase, passive with relative clause on subject and by phrase 
with and without comp, passive with relative clause on subject and no by phrase with and 
without comp, passive with relative clause on object of the by phrase with and without 
comp/with PP pied-piping, passive with relative clause on the object of the PP and by 
phrase with and without comp/with PP pied-piping, passive with relative clause on the 
object of the PP and no by phrase with and without comp/with PP pied-piping, passive 
with and without by phrase with adjunct (gap-less) relative clause with comp/with PP 
pied-piping, gerund passive with by phrase, gerund passive without by phrase. 



6.5. DITRANSITIVE WITH PP SHIFT: TNX0VNX1TONX2 
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6.5 Ditransitive with PP shift: TnxOVnxltonx2 

Description: This tree family is selected by ditransitive verbs that undergo a shift to a to 
prepositional phrase. These ditransitive verbs are clearly constrained so that when they 
shift, the prepositional phrase must start with to. This is in contrast to the Ditransitives 



with PP in section 6A, in which verbs may appear in [NP V NP PP] constructions with 
a variety of prepositions. Both the dative shifted and non-shifted PP complement trees 
are included. 56 verbs select this family. 

Examples: give, promise, tell 
Bill gave Hillary flowers . 
Bill gave flowers to Hillary . 
Whitman promised the voters a tax cut . 
Whitman promised a tax cut to the voters . 
Pinnochino told Gepetto a lie . 
Pinnochino told a lie to Gepetto . 

Declarative tree: See Figure |6.5[ 



NPoJ. VP 



VO NPji PP 2 



to 



(a) 



NP 2 i 




NP,1 



Figure 6.5: Declarative Ditransitive with PP shift Trees: anxOVnxlPnx2 (a) and 
anxOVnx2nxl (b) 



Other available trees: Non-shifted: wh-moved subject, wh-moved direct object, wh-moved 
indirect object, subject relative clause with and without comp, adjunct (gap-less) relative 
clause with comp/with PP pied-piping, direct object relative clause with comp/with PP 
pied-piping, indirect object relative clause with and without comp/with PP pied-piping, 
imperative, NP gerund, passive with by phrase, passive without by phrase, passive with 
wh-moved subject and by phrase, passive with wh-moved subject and no by phrase, passive 
with wh-moved object out of the by phrase, passive with wh-moved by phrase, passive 
with wh-moved indirect object and by phrase, passive with wh-moved indirect object and 
no by phrase, passive with relative clause on subject and by phrase with and without 
comp, passive with relative clause on subject and no by phrase with and without comp, 
passive with relative clause on object of the by phrase with and without comp/with PP 
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pied-piping, passive with relative clause on the indirect object and by phrase with and 
without comp/with PP pied-piping, passive with relative clause on the indirect object 
and no by phrase with and without comp/with PP pied-piping, passive with/without 
by-phrase with adjunct (gap-less) relative clause with comp/with PP pied-piping, gerund 
passive with by phrase, gerund passive without by phrase; 

Shifted: wh-moved subject, wh-moved direct object, wh-moved object of PP, wh-moved 
PP, subject relative clause with and without comp, adjunct (gap-less) relative clause with 
comp/with PP pied-piping, direct object relative clause with comp/with PP pied-piping, 
object of PP relative clause with and without comp/with PP pied-piping, imperative, 
determiner gerund, NP gerund, passive with by phrase, passive without by phrase, passive 
with wh-moved subject and by phrase, passive with wh-moved subject and no by phrase, 
passive with wh-moved object out of the by phrase, passive with wh-moved by phrase, 
passive with wh-moved object out of the PP and by phrase, passive with wh-moved object 
out of the PP and no by phrase, passive with wh-moved PP and by phrase, passive with 
wh-moved PP and no by phrase, passive with relative clause on subject and by phrase 
with and without comp, passive with relative clause on subject and no by phrase with and 
without comp, passive with relative clause on object of the by phrase with and without 
comp/with PP pied-piping, passive with relative clause on the object of the PP and by 
phrase with and without comp/with PP pied-piping, passive with relative clause on the 
object of the PP and no by phrase with and without comp/with PP pied-piping, passive 
with/without fry-phrase with adjunct (gap- less) relative clause with comp/with PP pied- 
piping, gerund passive with by phrase, gerund passive without by phrase. 

6.6 Sentential Complement with NP: TnxOVnxls2 

Description: This tree family is selected by verbs that take both an NP and a sentential 
complement. The sentential complement may be infinitive or indicative. The type of 
clause is specified by each individual verb in its syntactic lexicon entry. A given verb 
may select more than one type of sentential complement. The declarative tree, and many 
other trees in this family, are auxiliary trees, as opposed to the more common initial trees. 
These auxiliary trees adjoin onto an S node in an existing tree of the type specified by 
the sentential complement. This is the mechanism by which TAGs are able to maintain 
long-distance dependencies (see Chapter even over multiple embeddings (e.g. What 
did Bill tell Mary that John said?). 79 verbs select this tree family. 

Examples: beg, expect, tell 

Srini begged Mark to increase his disk quota . 
Beth told Jim that it was his turn . 

Declarative tree: See Figure |6.6[ 

Other available trees: wh-moved subject, wh-moved object, wh-moved sentential comple- 
ment, subject relative clause with and without comp, adjunct (gap-less) relative clause 
with comp/with PP pied-piping, object relative clause with and without comp, imperative, 
determiner gerund, NP gerund, passive with by phrase before sentential complement, pas- 
sive with by phrase after sentential complement, passive without by phrase, passive with 
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NP 1 VP 




VO NP 7 4 S 2 * 



Figure 6.6: Declarative Sentential Complement with NP Tree: /3nxOVnxls2 

wh-moved subject and by phrase before sentential complement, passive with wh-moved 
subject and by phrase after sentential complement, passive with wh-moved subject and 
no by phrase, passive with wh-moved object out of the by phrase, passive with wh-moved 
by phrase, passive with relative clause on subject and by phrase before sentential comple- 
ment with and without comp, passive with relative clause on subject and by phrase after 
sentential complement with and without comp, passive with relative clause on subject and 
no by phrase with and without comp, passive with/without by-phrase with adjunct (gap- 
less) relative clause with comp/with PP pied-piping, gerund passive with by phrase befor 
sentential complement, gerund passive with by phrase after the sentential complement, 
gerund passive without by phrase, parenthetical reporting clause. 

6.7 Intransitive Verb Particle: TnxOVpl 

Description: The trees in this tree family are anchored by both the verb and the verb parti- 
cle. Both appear in the syntactic lexicon and together select this tree family. Intransitive 
verb particles can be difficult to distinguish from intransitive verbs with adverbs adjoined 
on. The main diagnostics for including verbs in this class are whether the meaning is 
compositional or not, and whether there is a transitive version of the verb/verb particle 
combination with the same or similar meaning. The existence of an alternate composi- 
tional meaning is a strong indication for a separate verb particle construction. There are 
159 verb/verb particle combinations. 

Examples: add up, come out, sign off 
The numbers never quite added up . 
John finally came out (of the closet) . 
I think that I will sign off now . 

Declarative tree: See Figure |6.7[ 

Other available trees: wh-moved subject, subject relative clause with and without comp, 
adjunct (gap-less) relative clause with comp/with PP pied-piping, imperative, determiner 
gerund, NP gerund. 
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Figure 6.7: Declarative Intransitive Verb Particle Tree: anxOVpl 



6.8 Transitive Verb Particle: TnxOVplnxl 

Description: Verb/verb particle combinations that take an NP complement select this tree 
family. Both the verb and the verb particle are anchors of the trees. Particle movement has 
been taken as the diagnostic to distinguish verb particle constructions from intransitives 
with adjoined PP's. If the alleged particle is able to undergo particle movement, in other 
words appear both before and after the direct object, then it is judged to be a particle. 
Items that do not undergo particle movement are taken to be prepositions. In many, 
but not all, of the verb particle cases, there is also an alternate prepositional meaning in 
which the lexical item did not move. (e.g. He looked up the number (in the phonebook). 
He looked the number up. Srini looked up the road (for Purnima's car). *He looked the 
road up.) There are 489 verb/verb particle combinations. 

Examples: blow off, make up, pick out 

He blew off his linguistics class for the third time . 
He blew his linguistics class off for the third time . 
The dyslexic leprechaun made up the syntactic lexicon . 
The dyslexic leprechaun made the syntactic lexicon up . 
I would like to pick out a new computer . 
I would like to pick a new computer out . 



Declarative tree: See Figure 5.8 




NP i VP 




NPgl VP 



VO PLO NP,i VO NPyi PLO 

(a) (b) 
Figure 6.8: Declarative Transitive Verb Particle Tree: anxOVplnxl (a) and anxOVnxlpl (b) 



Other available trees: wh-moved subject with particle before the NP, wh-moved subject 
with particle after the NP, wh-moved object, subject relative clause with particle before 
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the NP with and without comp, subject relative clause with particle after the NP with and 
without comp, object relative clause with and without comp, adjunct (gap-less) relative 
clause with particle before the NP with comp/with PP pied-piping, adjunct (gap-less) 
relative clause with particle after the NP with comp/with PP pied-piping, imperative 
with particle before the NP, imperative with particle after the NP, determiner gerund 
with particle before the NP, NP gerund with particle before the NP, NP gerund with 
particle after the NP, passive with by phrase, passive without by phrase, passive with wh- 
moved subject and by phrase, passive with wh-moved subject and no by phrase, passive 
with wh-moved object out of the by phrase, passive with wh-moved by phrase, passive 
with relative clause on subject and by phrase with and without comp, passive with relative 
clause on subject and no by phrase with and without comp, passive with relative clause on 
object of the by phrase with and without comp/with PP pied-piping, passive with/without 
by-phrase with adjunct (gap-less) relative clause with comp/with PP pied-piping, gerund 
passive with by phrase, gerund passive without by phrase. 



6.9 Ditransitive Verb Particle: TnxOVplnxlnx2 

Description: Verb/verb particle combinations that select this tree family take 2 NP comple- 
ments. Both the verb and the verb particle anchor the trees, and the verb particle can 
occur before, between, or after the noun phrases. Perhaps because of the complexity of 
the sentence, these verbs do not seem to have passive alternations (*^4 new bank account 
was opened up Michelle by me). There are 4 verb/verb particle combinations that select 
this tree family. The exhaustive list is given in the examples. 

Examples: dish out, open up, pay off, rustle up 
I opened up Michelle a new bank account . 
I opened Michelle up a new bank account . 
I opened Michelle a new bank account up . 



Declarative tree: See Figure 5.9 




NPoJ. VP 




NPoi VP 



NPo-l VP 



VO PLO NP,4. NP 2 i VO NPjl PLO NP 2 J- VO NP ; i NP 2 J. PLO 



(a) 



(b) 



Figure 6.9: Declarative Ditransitive Verb Particle Tree: 
anxOVnxlpmx2 (b) and anxOVnxlnx2pl (c) 



(c) 

anxOVplnxlnx2 (a), 



Other available trees: wh-moved subject with particle before the NP's, wh-moved subject 
with particle between the NP's, wh-moved subject with particle after the NP's, wh-moved 
indirect object with particle before the NP's, wh-moved indirect object with particle after 
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the NP's, wh-moved direct object with particle before the NP's, wh-moved direct object 
with particle between the NP's, subject relative clause with particle before the NP's 
with and without comp, subject relative clause with particle between the NP's with 
and without comp, subject relative clause with particle after the NP's with and without 
comp, indirect object relative clause with particle before the NP's with and without comp, 
indirect object relative clause with particle after the NP's with and without comp, direct 
object relative clause with particle before the NP's with and without comp, direct object 
relative clause with particle between the NP's with and without comp, adjunct (gap- 
less) relative clause with comp/with PP pied-piping, imperative with particle before the 
NP's, imperative with particle between the NP's, imperative with particle after the NP's, 
determiner gerund with particle before the NP's, NP gerund with particle before the NP's, 
NP gerund with particle between the NP's, NP gerund with particle after the NP's. 



6.10 Intransitive with PP: TnxOVpnxl 

Description: The verbs that select this tree family are not strictly intransitive, in that they 
must be followed by a prepositional phrase. Verbs that are intransitive and simply can 
be followed by a prepositional phrase do not select this family, but instead have the 
PP adjoin onto the intransitive sentence. Accordingly, there should be no verbs in both 
this class and the intransitive tree family (see section |6.1| ). The prepositional phrase is 
not restricted to being headed by any particular lexical item. Note that these are not 
transitive verb particles (see section |6.8[ ), since the head of the PP does not move. 169 
verbs select this tree family. 

Examples: grab, impinge, provide 
Seth grabbed for the brass ring . 
The noise gradually impinged on Dania's thoughts . 
A good host provides for everyone 's needs . 



Declarative tree: See Figure 6.10. 




Figure 6.10: Declarative Intransitive with PP Tree: anxOVpnxl 



Other available trees: wh-moved subject, wh-moved object of the PP, wh-moved PP, subject 
relative clause with and without comp, adjunct (gap-less) relative clause with comp/with 
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PP pied-piping, object of the PP relative clause with and without comp/with PP pied- 
piping, imperative, determiner gerund, NP gerund, passive with by phrase, passive without 
by phrase, passive with wh- moved subject and by phrase, passive with wh-moved subject 
and no by phrase, passive with wh-moved by phrase, passive with relative clause on subject 
and by phrase with and without comp, passive with relative clause on subject and no by 
phrase with and without comp, passive with relative clause on object of the by phrase with 
and without comp/with PP pied-piping, passive with/without fry-phrase with adjunct 
(gap- less) relative clause with comp / with PP pied-piping, gerund passive with by phrase, 
gerund passive without by phrase. 



6.11 Predicative Multi-word with Verb, Prep anchors: TnxOVPnxl 

Description: This tree family is selected by multiple anchor verb/preposition pairs which 
together have a non-compositional interpretation. For example, think of has the non- 
compositional interpretion involving the inception of a notion or mental entity in addition 
to the interpretion in which the agent is thinking about someone or something. Anchors 
for this tree must be able to take both gerunds and regular NP's in the second noun 
position. To allow adverbs to appear between the verb and the preposition, the trees 
contain an extra VP level. Several of the verbs which select the TnxOVpnxl family, but 
which should not have quite the freedom it allows, will be moving to this family for the 
next release. 28 verb/preposition pairs select this tree family. 

Examples: think of, believe in, depend on 
Calvin thought of a new idea . 
Hobbes believes in sleeping all day . 
Bill depends on drinking coffee for stimulation . 



Declarative tree: See Figure 6.11. 




NPq^ VPi 




Figure 6.11: Declarative PP Complement Tree: anxOVPnxl 



Other available trees: wh-moved subject, wh-moved object, subject relative clause with and 
without comp, adjunct (gap-less) relative clause with comp/with PP pied-piping, object 
relative clause with and without comp, imperative, determiner gerund, NP gerund, passive 
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with by phrase, passive without by phrase, passive with wh-moved subject and by phrase, 
passive with wh-moved subject and no by phrase, passive with wh-moved object out of the 
by phrase, passive with wh-moved by phrase, passive with relative clause on subject and by 
phrase with and without comp, passive with relative clause on subject and no by phrase 
with and without comp, passive with relative clause on object on the by phrase with 
and without comp/with PP pied-piping, passive with/without by-phrase with adjunct 
(gap- less) relative clause with comp/with PP pied-piping, gerund passive with by phrase, 
gerund passive without by phrase. In addition, two other trees that allow transitive verbs 
to function as adjectives (e.g. the thought of idea) are also in the family. 



6.12 Sentential Complement: TnxOVsl 

Description: This tree family is selected by verbs that take just a sentential complement. The 
sentential complement may be of type infinitive, indicative, or small clause (see Chapter |9|). 
The type of clause is specified by each individual verb in its syntactic lexicon entry, and a 
given verb may select more than one type of sentential complement. The declarative tree, 
and many other trees in this family, are auxiliary trees, as opposed to the more common 
initial trees. These auxiliary trees adjoin onto an S node in an existing tree of the type 
specified by the sentential complement. This is the mechanism by which TAGs are able 
to maintain long-distance dependencies (see Chapter 13), even over multiple embeddings 



(e.g. What did Bill think that John said?). 338 verbs select this tree family. 

Examples: consider, think 

Dania considered the algorithm unworkable . 
Srini thought that the program was working . 



Declarative tree: See Figure 3.12. 




NPni VP 



VO S; 



Figure 6.12: Declarative Sentential Complement Tree: /3nxOVsl 



Other available trees: wh-moved subject, wh-moved sentential complement, subject relative 
clause with and without comp, adjunct (gap-less) relative clause with comp/with PP pied- 
piping, imperative, determiner gerund, NP gerund, parenthetical reporting clause. 



6.13. INTRANSITIVE WITH ADJECTIVE: TNX0VAX1 
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6.13 Intransitive with Adjective: TnxOVaxl 

Description: The verbs that select this tree family take an adjective as a complement. The 
adjective may be regular, comparative, or superlative. It may also be formed from the 
special class of adjectives derived from the transitive verbs (e.g. agitated, broken). See 
section p7 



Unlike the Intransitive with PP verbs (see section |6.10[) , some of these verbs 
may also occur as bare intransitives as well. This distinction is drawn because adjectives 
do not normally adjoin onto sentences, as prepositional phrases do. Other intransitive 
verbs can only occur with the adjective, and these select only this family. The verb class is 
also distinguished from the adjective small clauses (see section 6.20) because these verbs 
are not raising verbs. 34 verbs select this tree family. 

Examples: become, grow, smell 

The greenhouse became hotter . 
The plants grew tall and strong . 
The flowers smelled wonderful . 

Declarative tree: See Figure |6.13| . 



NP i 




VO AP; i 



Figure 6.13: Declarative Intransitive with Adjective Tree: anxOVaxl 



Other available trees: wh-moved subject, wh-moved adjective (how), subject relative clause 
with and without comp, adjunct (gap-less) relative clause with comp/with PP pied-piping, 
imperative, NP gerund. 



6.14 Transitive Sentential Subject: TsOVnxl 

Description: The verbs that select this tree family all take sentential subjects, and are often 
referred to as 'psych' verbs, since they all refer to some psychological state of mind. The 
sentential subject can be indicative (complementizer required) or infinitive (complemen- 
tizer optional). 100 verbs that select this tree family. 

Examples: delight, impress, surprise 

that the tea had rosehips in it delighted Christy . 
to even attempt a marathon impressed Dania . 
For Jim to have walked the dogs surprised Beth . 

Declarative tree: See Figure |6.14 . 
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Figure 6.14: Declarative Sentential Subject Tree: asOVnxl 



Other available trees: wh-moved subject, wh-moved object, subject relative clause with and 
without comp, adjunct (gap-less) relative clause with comp/with PP pied-piping. 



6.15 Light Verbs: TnxOlVNl 

Description: The verb/noun pairs that select this tree families are pairs in which the interpre- 
tation is non-compositional and the noun contributes argument structure to the predicate 
(e.g. The man took a walk. vs. The man took a radio). The verb and the noun occur 
together in the syntactic database, and both anchor the trees. The verbs in the light verb 
constructions are do, give, have, make and take. The noun following the light verb is (usu- 
ally) in a bare infinitive form (have a good cry) and usually occurs with a(n). However, 
we include deverbal nominals (take a bath, give a demonstration) as well. Constructions 
with nouns that do not contribute an argument structure (have a cigarette, give NP a 
black eye) are excluded. In addition to semantic considerations of light verbs, they differ 
syntactically from Transitive verbs (section ^T^) as well in that the noun in the light verb 
construction does not extract. Some of the verb- noun anchors for this family, like take 
aim and take hold disallow determiners, while others require particular determiners. For 
example, have think must be indefinite and singular, as attested by the ungrammaticality 
of * John had the think/some thinks. Another anchor, take leave can occur either bare 
or with a possesive pronoun (e.g., John took his leave, but not *John took the leave). 
This is accomplished through feature specification on the lexical entries. There are 259 
verb/noun pairs that select the light verb tree. 

Examples: give groan, have discussion, make comment 
The audience gave a collective groan . 
We had a big discussion about closing the libraries . 
The professors made comments on the paper . 



Declarative tree: See Figure 3.15 



Other available trees: wh-moved subject, subject relative clause with and without comp, 
adjunct (gap-less) relative clause with comp/with PP pied-piping, imperative, determiner 
gerund, NP gerund. 



6.16. DITRANSITIVE LIGHT VERBS WITH PP SHIFT: TNX0LVN1PNX2 
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Figure 6.15: Declarative Light Verb Tree: anxOlVNl 



6.16 Ditransitive Light Verbs with PP Shift: Tnx01VNlPnx2 

Description: The verb/noun pairs that select this tree family are pairs in which the interpre- 
tation is non-compositional and the noun contributes argument structure to the predicate 
(e.g. Dania made Srini a cake. vs. Dania made Srini a loan). The verb and the noun 
occur together in the syntactic database, and both anchor the trees. The verbs in these 
light verb constructions are give and make. The noun following the light verb is (usually) 
a bare infinitive form (e.g. make a promise to Anoop). However, we include deverbal 
nominals (e.g. make a payment to Anoop) as well. Constructions with nouns that do not 
contribute an argument structure are excluded. In addition to semantic considerations 
of light verbs, they differ syntactically from the Ditransitive with PP Shift verbs (see 
section |6.5| ) as well in that the noun in the light verb construction does not extract. Also, 
passivization is severely restricted. Special determiner requirments and restrictions are 
handled in the same manner as for the TnxOlVNl family. There are 18 verb/noun pairs 
that select this family. 



Examples: give look, give wave, make promise 
Dania gave Carl a murderous look . 
Amanda gave us a little wave as she left . 
Dania made Doug a promise . 



Declarative tree: See Figure 5.16. 



Other available trees: Non-shifted: wh-moved subject, wh-moved indirect object, subject 
relative clause with and without comp, adjunct (gap-less) relative clause with comp/with 
PP pied-piping, indirect object relative clause with and without comp/with PP pied- 
piping, imperative, NP gerund, passive with by phrase, passive with fry-phrase with ad- 
junct (gap- less) relative clause with comp/with PP pied-piping, gerund passive with by 
phrase, gerund passive without by phrase 

Shifted: wh-moved subject, wh-moved object of PP, wh-moved PP, subject relative 
clause with and without comp, object of PP relative clause with and without comp/with 
PP pied-piping, imperative, determiner gerund, NP gerund, passive with by phrase with 
adjunct (gap-less) relative clause with comp/with PP pied-piping, gerund passive with by 
phrase, gerund passive without by phrase. 
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Figure 6.16: Declarative Light Verbs with PP Tree: anx01VNlPnx2 (a), anx01Vnx2Nl (b) 



6.17 NP It-Cleft: TItVnxls2 



Description: This tree family is selected by be as the main verb and it as the subject. Together 
these two items serve as a multi-component anchor for the tree family. This tree family 
is used for it-clefts in which the clefted element is an NP and there are no gaps in the 
clause which follows the NP. The NP is interpreted as an adjunct of the following clause. 



See Chapter 11 for additional discussion. 



Examples: it be 

it was yesterday that we had the meeting . 



Declarative tree: See Figure 3.17. 




PPl S2i 



Figure 6.17: Declarative NP It-Cleft Tree: a!tVpnxls2 



Other available trees: inverted question, wh-moved object with be inverted, wh-moved ob- 
ject with be not inverted, adjunct (gap-less) relative clause with comp/with PP pied- 
piping. 



6.18. PP IT-CLEFT: TITVPNX1S2 
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6.18 PP It-Cleft: TItVpnxls2 

Description: This tree family is selected by be as the main verb and it as the subject. Together 
these two items serve as a multi-component anchor for the tree family. This tree family is 
used for it-clefts in which the clefted element is a PP and there are no gaps in the clause 
which follows the PP. The PP is interpreted as an adjunct of the following clause. See 



Chapter 11 for additional discussion. 



Examples: it be 

it was at Kent State that the police shot all those students 



Declarative tree: See Figure 3.18 



VP 



NP 
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Figure 6.18: Declarative PP It-Cleft Tree: a!tVnxls2 



Other available trees: inverted question, wh-moved prepositional phrase with be inverted, 
wh-moved prepositional phrase with be not inverted, adjunct (gap-less) relative clause 
with comp/with PP pied-piping. 



6.19 Adverb It-Cleft: TItVadls2 

Description: This tree family is selected by be as the main verb and it as the subject. Together 
these two items serve as a multi-component anchor for the tree family. This tree family 
is used for it-clefts in which the clefted element is an adverb and there are no gaps in the 
clause which follows the adverb. The adverb is interpreted as an adjunct of the following 
clause. See Chapter [ll] for additional discussion. 

Examples: it be 

it was reluctantly that Dania agreed to do the tech report . 



Declarative tree: See Figure 3.19| . 



Other available trees: inverted question, wh-moved adverb how with be inverted, wh-moved 
adverb how with be not inverted, adjunct (gap-less) relative clause with comp/with PP 
pied-piping. 
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Figure 6.19: Declarative Adverb It-Cleft Tree: aItVadls2 

6.20 Adjective Small Clause Tree: TnxOAxl 

Description: These trees are not anchored by verbs, but by adjectives. They are explained 
in much greater detail in the section on small clauses (see section [)^). This section is 
presented here for completeness. 3244 adjectives select this tree family. 

Examples: addictive, dangerous, wary 
cigarettes are addictive . 
smoking cigarettes is dangerous . 

John seems wary of the Surgeon General 's warnings . 
Declarative tree: See Figure 



Sr 
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NA 
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Figure 6.20: Declarative Adjective Small Clause Tree: anxOAxl 

Other available trees: wh-moved subject, wh-moved adjective how, relative clause on subject 
with and without comp, imperative, NP gerund, adjunct (gap-less) relative clause with 
comp/with PP pied-piping. 



6.21 Adjective Small Clause with Sentential Complement: TnxOAlsl 



Description: This tree family is selected by adjectives that take sentential complements. The 
sentential complements can be indicative or infinitive. Note that these trees are anchored 



6.22. ADJECTIVE SMALL CLAUSE WITH SENTENTIAL SUBJECT: TS0AX1 
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by adjectives, not verbs. Small clauses are explained in much greater detail in section 
This section is presented here for completeness. 669 adjectives select this tree family. 

Examples: able, curious, disappointed 

Christy was able to find the problem . 

Christy was curious whether the new analysis was working . 
Christy was sad that the old analysis failed . 



Declarative tree: See Figure 5.21 



NPo-L 




e AO Sjl 

Figure 6.21: Declarative Adjective Small Clause with Sentential Complement Tree: cmxOAlsl 

Other available trees: wh-moved subject, wh-moved adjective how, relative clause on subject 
with and without comp, imperative, NP gerund, adjunct (gap-less) relative clause with 
comp/with PP pied-piping. 

6.22 Adjective Small Clause with Sentential Subject: TsOAxl 

Description: This tree family is selected by adjectives that take sentential subjects. The 
sentential subjects can be indicative or infinitive. Note that these trees are anchored by 
adjectives, not verbs. Most adjectives that take the Adjective Small Clause tree family 
(see section 6.2C| ) take this family as well|] Small clauses are explained in much greater 



detail in section This section is presented here for completeness. 3,185 adjectives 
select this tree family. 

Examples: decadent, incredible, uncertain 

to eat raspberry chocolate truffle ice cream is decadent . 

that Carl could eat a large bowl of it is incredible . 

whether he will actually survive the experience is uncertain . 



Declarative tree: See Figure 3.22. 



Other available trees: wh-moved subject, wh-moved adjective, adjunct (gap-less) relative 
clause with comp/with PP pied-piping. 



6 No great attempt has been made to go through and decide which adjectives should actually take this family 
and which should not. 
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Figure 6.22: Declarative Adjective Small Clause with Sentential Subject Tree: asOAxl 



6.23 Equative BE: TnxOBEnxl 



Description: This tree family is selected only by the verb be. It is distinguished from the 



predicative NP's (see section 3.24) in that two NP's are equated, and hence interchange- 
able (see Chapter || for more discussion on the English copula and predicative sentences) . 
The XTAG analysis for equative be is explained in greater detail in section 



Examples: be 

That man is my uncle. 



Declarative tree: See Figure S.23. 



NP„-L VP r 



VO VP, 



V; NP/1 



Figure 6.23: Declarative Equative BE Tree: anxOBEnxl 



Other available trees: inverted-question. 



6.24 NP Small Clause: TnxONl 



Description: The trees in this tree family are not anchored by verbs, but by nouns. Small 
clauses are explained in much greater detail in section [T^. This section is presented here 
for completeness. 5,595 nouns select this tree family. 



6.25. NP SMALL CLAUSE WITH SENTENTIAL COMPLEMENT: TNX0N1S1 
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Examples: author, chair, dish 
Dania is an author . 

that blue, warped-looking thing is a chair 
those broken pieces were dishes . 



Declarative tree: See Figure 5.24. 




NP I VP 



V NP, 



£ NO 



Figure 6.24: Declarative NP Small Clause Trees: anxONl 

Other available trees: wh-moved subject, wh-moved object, relative clause on subject with 
and without comp, imperative, NP gerund, adjunct (gap-less) relative clause with comp/with 
PP pied-piping. 

6.25 NP Small Clause with Sentential Complement: TnxONlsl 

Description: This tree family is selected by the small group of nouns that take sentential com- 
plements by themselves (see section [T^). The sentential complements can be indicative or 
infinitive, depending on the noun. Small clauses in general are explained in much greater 
detail in the section pO| . This section is presented here for completeness. 141 nouns select 
this family. 

Examples: admission, claim, vow 

The affidavits are admissions that they killed the sheep . 
there is always the claim that they were insane . 
this is his vow to fight the charges . 

Declarative tree: See Figure 

Other available trees: wh-moved subject, wh-moved object, relative clause on subject with 
and without comp, imperative, NP gerund, adjunct (gap- less) relative clause with comp/with 
PP pied-piping. 

6.26 NP Small Clause with Sentential Subject: TsONl 

Description: This tree family is selected by nouns that take sentential subjects. The sentential 
subjects can be indicative or infinitive. Note that these trees are anchored by nouns, not 
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No Sll 



Figure 6.25: Declarative NP with Sentential Complement Small Clause Tree: cmxONlsl 



verbs. Most nouns that take the NP Small Clause tree family (see section 3.24 ) take this 
family as wellj] Small clauses are explained in much greater detail in section |9.3| . This 
section is presented here for completeness. 5,519 nouns select this tree family. 



Examples: dilemma, insanity, tragedy 

whether to keep the job he hates is a dilemma . 
to invest all of your money in worms is insanity . 
that the worms died is a tragedy . 



Declarative tree: See Figure 6.26. 



Soi VP 




Figure 6.26: Declarative NP Small Clause with Sentential Subject Tree: asONl 



Other available trees: wh-moved subject, adjunct (gap-less) relative clause with comp/with 
PP pied-piping. 



6.27 PP Small Clause: TnxOPnxl 

Description: This family is selected by prepositions that can occur in small clause construc- 
tions. For more information on small clause constructions, see section |9.3| . This section 
is presented here for completeness. 39 prepositions select this tree family. 

7 No great attempt has been made to go through and decide which nouns should actually take this family and 
which should not. 



6.28. EXHAUSTIVE PP SMALL CLAUSE: TNX0PX1 
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Examples: around, in, underneath 
Chris is around the corner . 
Trisha is in big trouble . 
The dog is underneath the table . 



Declarative tree: See Figure 5.27 



NPQi VP 




e PO NPlI 



Figure 6.27: Declarative PP Small Clause Tree: anxOPnxl 



Other available trees: wh-moved subject, wh-moved object of PP, relative clause on subject 
with and without comp, relative clause on object of PP with and without comp/with PP 
pied-piping, imperative, NP gerund, adjunct (gap-less) relative clause with comp/with 
PP pied-piping. 



6.28 Exhaustive PP Small Clause: TnxOPxl 

Description: This family is selected by exhaustive prepositions that can occur in small 
clauses. Exhaustive prepositions are prepositions that function as prepositional phrases 



by themselves. For more information on small clause constructions, please see section 9.3. 
The section is included here for completeness. 33 exhaustive prepositions select this tree 
family. 

Examples: abroad, below, outside 
Dr. Joshi is abroad . 
The workers are all below . 
Clove is outside . 



Declarative tree: See Figure 3.28 



Other available trees: wh-moved subject, wh-moved PP, relative clause on subject with and 
without comp, imperative, NP gerund, adjunct (gap-less) relative clause with comp/with 
PP pied-piping. 



6.29 PP Small Clause with Sentential Subject: TsOPnxl 



Description: This tree family is selected by prepositions that take sentential subjects. The 
sentential subject can be indicative or infinitive. Small clauses are explained in much 
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Figure 6.28: Declarative Exhaustive PP Small Clause Tree: anxOPxl 



greater detail in section 9.3. This section is presented here for completeness. 39 preposi- 



tions select this tree family. 

Examples: beyond, unlike 

that Ken could forget to pay the taxes is beyond belief . 

to explain how this happened is outside the scope of this discussion . 

for Ken to do something right is unlike him . 



Declarative tree: See Figure |6.29 




Figure 6.29: Declarative PP Small Clause with Sentential Subject Tree: asOPnxl 



Other available trees: wh-moved subject, relative clause on object of the PP with and with- 
out comp/with PP pied-piping, adjunct (gap-less) relative clause with comp/with PP 
pied-piping. 



6.30 Intransitive Sentential Subject: TsOV 

Description: Only the verb matter selects this tree family. The sentential subject can be 
indicative (complementizer required) or infinitive (complementizer optional). 

Examples: matter 

to arrive on time matters considerably . 

that Joshi attends the meetings matters to everyone . 



Declarative tree: See Figure 



3.3C. 



6.31. SENTENTIAL SUBJECT WITH 'TO' COMPLEMENT: TS0VTONX1 



61 




Figure 6.30: Declarative Intransitive Sentential Subject Tree: asOV 



Other available trees: wh-moved subject, adjunct (gap-less) relative clause with comp/with 
PP pied-piping. 



6.31 Sentential Subject with 'to' complement: TsOVtonxl 

Description: The verbs that select this tree family are fall, occur and leak. The sentential sub- 
ject can be indicative (complementizer required) or infinitive (complementizer optional). 

Examples: fall, occur, leak 

to wash the car fell to the children . 

that he should leave occurred to the party crasher . 

whether the princess divorced the prince leaked to the press . 



Declarative tree: See Figure 6.31. 




Sgl VP 



VO PP, 



P; NPjl 



to 



Figure 6.31: Sentential Subject Tree with 'to' complement: asOVtonxl 



Other available trees: wh-moved subject, adjunct (gap-less) relative clause with comp/with 
PP pied-piping. 



6.32 PP Small Clause, with Adv and Prep anchors: TnxOARBPnxl 



Description: This family is selected by multi-word prepositions that can occur in small clause 
constructions. In particular, this family is selected by two-word prepositions, where the 
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first word is an adverb, the second word a preposition. Both components of the mufti- 
word preposition are anchors. For more information on smafl clause constructions, see 
section |9.3|. 8 multi-word prepositions seiect this tree family. 



Examples: ahead of, close to 

The little girl is ahead of everyone else in the race 
The project is close to completion . 



Declarative tree: See Figure 3.32 . 



NPoi VP 

V PPi 

e Pi NPlJ- 
Ado Po 



Figure 6.32: Declarative PP Small Clause tree with two-word preposition, where the first word 
is an adverb, and the second word is a preposition: anxOARBPnxl 



Other available trees: wh-moved subject, wh-moved object of PP, relative clause on subject 
with and without comp, relative clause on object of PP with and without comp, adjunct 
(gap-less) relative clause with comp/with PP pied-piping, imperative, NP Gerund. 



6.33 PP Small Clause, with Adj and Prep anchors: TnxOAPnxl 

Description: This family is selected by multi-word prepositions that can occur in small clause 
constructions. In particular, this family is selected by two-word prepositions, where the 
first word is an adjective, the second word a preposition. Both components of the multi- 
word preposition are anchors. For more information on small clause constructions, see 
section |9.3| . 8 multi-word prepositions select this tree family. 

Examples: according to, void of 

The operation we performed was according to standard procedure . 
He is void of all feeling . 



Declarative tree: See Figure 3.33| . 

Other available trees: wh-moved subject, relative clause on subject with and without comp, 
relative clause on object of PP with and without comp, wh-moved object of PP, adjunct 
(gap-less) relative clause with comp/with PP pied-piping. 



6.34. PP SMALL CLAUSE, WITH NOUN AND PREP ANCHORS: TNX0NPNX1 
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Figure 6.33: Declarative PP Small Clause tree with two-word preposition, where the first word 
is an adjective, and the second word is a preposition: anxOAPnxl 

6.34 PP Small Clause, with Noun and Prep anchors: TnxONPnxl 

Description: This family is selected by multi-word prepositions that can occur in small clause 
constructions. In particular, this family is selected by two-word prepositions, where the 
first word is a noun, the second word a preposition. Both components of the multi- 
word preposition are anchors. For more information on small clause constructions, see 
section |9.3| . 1 multi-word preposition selects this tree family. 

Examples: thanks to 

The fact that we are here tonight is thanks to the valiant efforts of our staff . 



Declarative tree: See Figure 5.34 



NPqJ- 




Pi NP\l 



Figure 6.34: Declarative PP Small Clause tree with two-word preposition, where the first word 
is a noun, and the second word is a preposition: anxONPnxl 



Other available trees: wh-moved subject, wh-moved object of PP, relative clause on subject 
with and without comp, relative clause on object with comp, adjunct (gap-less) relative 
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clause with comp/with PP pied-piping. 



6.35 PP Small Clause, with Prep anchors: TnxOPPnxl 

Description: This family is selected by multi-word prepositions that can occur in small clause 
constructions. In particular, this family is selected by two-word prepositions, where both 
words are prepositions. Both components of the multi-word preposition are anchors. For 
more information on small clause constructions, see section 9 multi-word prepositions 
select this tree family. 

Examples: on to, inside of 

that detective is on to you . 

The red box is inside of the blue box . 



Declarative tree: See Figure 5.35. 



NPQi VP 



V PPi 

NA 




e P NPii 
Pl« P20 



Figure 6.35: Declarative PP Small Clause tree with two-word preposition, where both words 
are prepositions: cmxOPPnxl 



Other available trees: wh-moved subject, wh-moved object of PP, relative clause on subject 
with and without comp, relative clause on object of PP with and without comp/with PP 
pied-piping, imperative, wh-moved object of PP, adjunct (gap-less) relative clause with 
comp/with PP pied-piping. 



6.36 PP Small Clause, with Prep and Noun anchors: TnxOPNaPnxl 

Description: This family is selected by multi-word prepositions that can occur in small clause 
constructions. In particular, this family is selected by three-word prepositions. The first 
and third words are always prepositions, and the middle word is a noun. The noun is 
marked for null adjunction since it cannot be modified by noun modifiers. All three 
components of the multi-word preposition are anchors. For more information on small 



clause constructions, see section 9.3. 24 multi-word preposition select this tree family. 



6.37. PP SMALL CLAUSE WITH SENTENTIAL SUBJECT, AND ADV AND PREP ANCHORS: TSOARBF 



Examples: in back of, in line with, on top of 

The red plaid box should be in back of the plain black box . 
The evidence is in line with my newly concocted theory . 
She is on top of the world . 
*She is on direct top of the world . 



Declarative tree: See Figure 5.36 
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Figure 6.36: Declarative PP Small Clause tree with three-word preposition, where the middle 
noun is marked for null adjunction: cmxOPNaPnxl 



Other available trees: wh-moved subject, wh-moved object of PP, relative clause on subject 
with and without comp, relative clause on object of PP with and without comp/with PP 
pied-piping, adjunct (gap- less) relative clause with comp/with PP pied-piping, imperative, 
NP Gerund. 



6.37 PP Small Clause with Sentential Subject, and Adv and 
Prep anchors: TsOARBPnxl 

Description: This tree family is selected by multi-word prepositions that take sentential sub- 
jects. In particular, this family is selected by two-word prepositions, where the first word 
is an adverb, the second word a preposition. Both components of the multi-word preposi- 
tion are anchors. The sentential subject can be indicative or infinitive. Small clauses are 



explained in much greater detail in section 9.3. 2 prepositions select this tree family. 



Examples: due to, contrary to 

that David slept until noon is due to the fact that he never sleeps during the week . 
that Michael 's joke was funny is contrary to the usual status of his comic attempts . 



Declarative tree: See Figure 



3.37. 
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Figure 6.37: Declarative PP Small Clause with Sentential Subject Tree, with two-word prepo- 
sition, where the first word is an adverb, and the second word is a preposition: asOARBPnxl 



Other available trees: wh-moved subject, relative clause on object of the PP with and with- 
out comp, adjunct (gap-less) relative clause with comp/with PP pied-piping. 



6.38 PP Small Clause with Sentential Subject, and Adj and 
Prep anchors: TsOAPnxl 

Description: This tree family is selected by multi-word prepositions that take sentential sub- 
jects. In particular, this family is selected by two-word prepositions, where the first word 
is an adjective, the second word a preposition. Both components of the multi-word prepo- 
sition are anchors. The sentential subject can be indicative or infinitive. Small clauses 



are explained in much greater detail in section 3.2. 5 prepositions select this tree family. 



Examples: devoid of according to 

that he could walk out on her is devoid of all reason . 

that the conversation erupted precisely at that moment was according to my theory . 



Declarative tree: See Figure 3.38. 



Other available trees: wh-moved subject, relative clause on object of the PP with and with- 
out comp, adjunct (gap-less) relative clause with comp/with PP pied-piping. 



6.39 PP Small Clause with Sentential Subject, and Noun and 
Prep anchors: TsONPnxl 

Description: This tree family is selected by multi-word prepositions that take sentential sub- 
jects. In particular, this family is selected by two-word prepositions, where the first word 
is a noun, the second word a preposition. Both components of the multi-word preposi- 



6.39. PP SMALL CLAUSE WITH SENTENTIAL SUBJECT, AND NOUN AND PREP ANCHORS: TSONPJS 




Figure 6.38: Declarative PP Small Clause with Sentential Subject Tree, with two-word prepo- 
sition, where the first word is an adjective, and the second word is a preposition: asOAPnxl 



tion are anchors. The sentential subject can be indicative or infinitive. Small clauses are 



explained in much greater detail in section 9.3. 1 preposition selects this tree family. 



Examples: thanks to 

that she is worn out is thanks to a long day in front of the computer terminal . 



Declarative tree: See Figure 3.39. 
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Figure 6.39: Declarative PP Small Clause with Sentential Subject Tree, with two-word prepo- 
sition, where the first word is a noun, and the second word is a preposition: asONPnxl 



Other available trees: wh-moved subject, relative clause on object of the PP with and with- 
out comp, adjunct (gap-less) relative clause with comp/with PP pied-piping. 
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6.40 PP Small Clause with Sentential Subject, and Prep an- 
chors: TsOPPnxl 

Description: This tree family is selected by multi-word prepositions that take sentential sub- 
jects. In particular, this family is selected by two-word prepositions, where both words are 
prepositions. Both components of the multi-word preposition are anchors. The sentential 
subject can be indicative or infinitive. Small clauses are explained in much greater detail 
in section 



9.3j . 3 prepositions select this tree family. 



Examples: outside of 

that Mary did not complete the task on time is outside of the scope of this discussion . 



Declarative tree: See Figure 3.4C. 




Figure 6.40: Declarative PP Small Clause with Sentential Subject Tree, with two-word prepo- 
sition, where both words are prepositions: asOPPnxl 

Other available trees: wh-moved subject, relative clause on object of the PP with and with- 
out comp, adjunct (gap-less) relative clause with comp/with PP pied-piping. 

6.41 PP Small Clause with Sentential Subject, and Prep and 
Noun anchors: TsOPNaPnxl 



Description: This tree family is selected by multi-word prepositions that take sentential sub- 
jects. In particular, this family is selected by three-word prepositions. The first and third 
words are always prepositions, and the middle word is a noun. The noun is marked for 
null adjunction since it cannot be modified by noun modifiers. All three components 
of the multi-word preposition are anchors. Small clauses are explained in much greater 



detail in section 9 prepositions select this tree family. 



Examples: on account of, in support of 

that Joe had to leave the beach was on account of the hurricane 



6.42. PREDICATIVE ADJECTIVE WITH SENTENTIAL SUBJECT AND COMPLEMENT: TS0A1S1Q9 



that Maria could not come is in support of my theory about her . 

*that Maria could not come is in direct/ strict/ desparate support of my theory about her . 



Declarative tree: See Figure S.41. 
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Figure 6.41: Declarative PP Small Clause with Sentential Subject Tree, with three- word prepo- 
sition, where the middle noun is marked for null adjunction: asOPNaPnxl 



Other available trees: wh-moved subject, relative clause on object of the PP with and with- 
out comp, adjunct (gap-less) relative clause with comp/with PP pied-piping. 



6.42 Predicative Adjective with Sentential Subject and Com- 
plement: TsOAlsl 

Description: This tree family is selected by predicative adjectives that take sentential subjects 
and a sentential complement. This tree family is selected by likely and certain. 

Examples: likely, certain 

that Max continues to drive a Jaguar is certain to make Bill jealous . 
for the Jaguar to be towed seems likely to make Max very angry . 



Declarative tree: See Figure 5.42. 



Other available trees: wh-moved subject, adjunct (gap-less) relative clause with comp/with 
PP pied-piping. 



6.43 Locative Small Clause with Ad anchor: TnxOnxlARB 



Description: These trees are not anchored by verbs, but by adverbs that are part of locative 
adverbial phrases. Locatives are explained in much greater detail in the section on the 
locative modifier trees (see section 19.6). The only remarkable aspect of this tree family is 
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Ao S\l 



Figure 6.42: Predicative Adjective with Sentential Subject and Complement: ctsOAlsl 



the wh-moved locative tree, aWlnxOnxlARB, shown in Figure 6.44. This is the only tree 
family with this type of transformation, in which the entire adverbial phrase is wh-moved 
but not all elements are replaced by wh items (as in how many city blocks away is the 
record store?). Locatives that consist of just the locative adverb or the locative adverb 
and a degree adverb (see Section |19.6| for details) are treated as exhaustive PPs and 
therefore select that tree family (Section |6.28|) when used predicatively. For an extensive 
description of small clauses, see Section |0. 26 adverbs select this tree family. 



Examples: ahead, offshore, behind 
the crash is three blocks ahead 
the naval battle was many kilometers offshore 
how many blocks behind was Max? 



Declarative tree: See Figure 3.43. 
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Figure 6.43: Declarative Locative Adverbial Small Clause Tree: anxOnxlARB 



Other available trees: wh-moved subject, relative clause on subject with and without comp, 
wh-moved locative, imperative, NP gerund. 



6.44. EXCEPTIONAL CASE MARKING: TXNX0VS1 
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Figure 6.44: Wh-moved Locative Small Clause Tree: aWlnxOnxlARB 



6.44 Exceptional Case Marking: TXnxOVsl 

Description: This tree family is selected by verbs that are classified as exceptional case mark- 
ing, meaning that the verb asssigns accusative case to the subject of the sentential com- 
plement. This is in contrast to verbs in the TnxOVnxls2 family (section |6.6[ ), which assign 
accusative case to a NP which is not part of the sentential complement. ECM verbs take 
sentential complements which are either an infinitive or a "bare" infinitive. As with the 
TnxOVsl family (section |6.12| ), the declarative and other trees in the XnxOVsl family are 
auxiliary trees, as opposed to the more common initial trees. These auxiliary trees adjoin 
onto an S node in an existing tree of the type specified by the sentential complement. 
This is the mechanism by which TAGs are able to maintain long-distance dependencies 
(see Chapter 13), even over multiple embeddings (e.g. Who did Bill expect to eat beans?) 



or who did Bill expect Mary to like? See section |8.6.1 for details on this family. 20 verbs 
select this tree family. 



Examples: expect, see 

Van expects Bob to talk . Bob sees the harmonica fall 



Declarative tree: See Figure 3.45. 




Figure 6.45: ECM Tree: /3Xnx0Vsl 
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Other available trees: wh-moved subject, subject relative clause with and without comp, 
adjunct (gap-less) relative clause with and without comp/with PP pied-piping, imperative, 
NP gerund. 



6.45 Idiom with V, D, and N anchors: TnxOVDNl 

Description: This tree family is selected by idiomatic phrases in which the verb, determiner, 
and NP are all frozen (as in He kicked the bucket.). Only a limited number of transfor- 
mations are allowed, as compared to the normal transitive tree family (see section |6.2[ ). 
Other idioms that have the same structure as kick the bucket, and that are limited to 
the same transformations would select this tree, while different tree families are used to 
handle other idioms. Note that John kicked the bucket is actually ambiguous, and would 
result in two parses - an idiomatic one (meaning that John died), and a compositional 
transitive one (meaning that there is an physical bucket that John hit with his foot). 1 
idiom selects this family. 



Examples: kick the bucket 
Nixon kicked the bucket 



Declarative tree: See Figure 3.46 
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Figure 6.46: Declarative Transitive Idiom Tree: anxOVDNl 



Other available trees: subject relative clause with and without comp, declarative, wh-moved 
subject, imperative, NP gerund, adjunct gapless relative with comp/with PP pied-piping, 
passive, w/wo by-phrase, wh-moved object of by-phrase, wh-moved by-phrase, relative 
(with and without comp) on subject of passive, PP relative. 



6.46 Idiom with V, D, A, and N anchors: TnxOVDANl 



Description: This tree family is selected by transitive idioms that are anchored by a verb, 
determiner, adjective, and noun. 19 idioms select this family. 



6.47. IDIOM WITH V AND N ANCHORS: TNX0VN1 
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Examples: have a green thumb, sing a different tune 

Martha might have a green thumb, but it's uncertain after the death of all the plants. 
After his conversion John sang a different tune. 




Figure 6.47: Declarative Idiom with V, D, A, and N Anchors Tree: anxOVDANl 



Other available trees: Subject relative clause with and without comp, adjunct relative clause 
with comp/with PP pied-piping, wh-moved subject, imperative, NP gerund, passive with- 
out by phrase, passive with by phrase, passive with wh-moved object of by phrase, passive 
with wh-moved by phrase, passive with relative on object of by phrase with and without 
comp. 



6.47 Idiom with V and N anchors: TnxOVNl 



Description: This tree family is selected by transitive idioms that are anchored by a verb and 
noun. 15 idioms select this family. 



Examples: draw blood, cry wolf 
Graham's retort drew blood. 
The neglected boy cried wolf. 



Declarative tree: See Figure 6.48. 



Other available trees: Subject relative clause with and without comp, adjunct relative clause 
with comp/with PP pied-piping, wh-moved subject, imperative, NP gerund, passive with- 
out by phrase, passive with by phrase, passive with wh-moved object of by phrase, passive 
with wh-moved by phrase, passive with relative on object of by phrase with and without 



comp. 
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Figure 6.48: Declarative Idiom with V and N Anchors Tree: anxOVNl 



6.48 Idiom with V, A, and N anchors: TnxOVANl 

Description: This tree family is selected by transitive idioms that are anchored by a verb, 
adjective, and noun. 4 idioms select this family. 

Examples: break new ground, cry bloody murder 
The avant-garde film breaks new ground. 

The investors cried bloody murder after the suspicious takeover. 



Declarative tree: See Figure 3.49 
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Figure 6.49: Declarative Idiom with V, A, and N Anchors Tree: anxOVANl 



Other available trees: Subject relative clause with and without comp, adjunct relative clause 
with comp/with PP pied-piping, wh-moved subject, imperative, NP gerund, passive with- 
out by phrase, passive with by phrase, passive with wh-moved object of by phrase, passive 
with wh-moved by phrase, passive with relative on object of by phrase with and without 



comp. 



6.49. IDIOM WITH V, D, A, N, AND PREP ANCHORS: TNX0VDAN1PNX2 
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6.49 Idiom with V, D, A, N, and Prep anchors: TnxOVDANlPnx2 

Description: This tree family is selected by transitive idioms that are anchored by a verb, 
determiner, adjective, noun, and preposition. 6 idioms select this family. 

Examples: make a big deal about, make a great show of 

John made a big deal about a miniscule dent in his car. 
The company made a big show of paying generous dividends. 




Figure 6.50: Declarative Idiom with V, D, A, N, and Prep Anchors Tree: anxOVDAN!Pnx2 



Other available trees: Subject relative clause with and without comp, adjunct relative clause 
with comp/with PP pied-piping, wh-moved subject, imperative, NP gerund, passive with- 
out by phrase, passive with by phrase, passive with wh-moved object of by phrase, passive 
with wh-moved by phrase, 

outer passive without by phrase, outer passive with by phrase, outer passive with wh- 
moved by phrase, outer passive with wh-moved object of by phrase, outer passive without 
by phrase with relative on the subject with and without comp, outer passive with by 
phrase with relative on subject with and without comp. 



6.50 Idiom with V, A, N, and Prep anchors: TnxOVANlPnx2 

Description: This tree family is selected by transitive idioms that are anchored by a verb, 
adjective, noun, and preposition. 3 idioms select this family. 

Examples: make short work of 

John made short work of the glazed ham. 



Declarative tree: See Figure 



3.51 
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Figure 6.51: Declarative Idiom with V, A, N, and Prep Anchors Tree: anxOVANlPnx2 



Other available trees: Subject relative clause with and without comp, adjunct relative clause 
with comp/with PP pied-piping, wh-moved subject, imperative, NP gerund, passive with- 
out by phrase, passive with by phrase, passive with wh-moved object of by phrase, passive 
with wh-moved by phrase, 

outer passive without by phrase, outer passive with by phrase, outer passive with wh- 
moved by phrase, outer passive with wh-moved object of by phrase, outer passive without 
by phrase with relative on the subject with and without comp, outer passive with by 
phrase with relative on subject with and without comp. 



6.51 Idiom with V, N, and Prep anchors: TnxOVNlPnx2 

Description: This tree family is selected by transitive idioms that are anchored by a verb, 
noun, and preposition. 6 idioms select this family. 

Examples: look daggers at, keep track of 

Maria looked daggers at her ex-husband across the courtroom. 
The company kept track of its inventory. 



Declarative tree: See Figure 3.52 



Other available trees: Subject relative clause with and without comp, adjunct relative clause 
with comp/with PP pied-piping, wh-moved subject, imperative, NP gerund, passive with- 
out by phrase, passive with by phrase, passive with wh-moved object of by phrase, passive 
with wh-moved by phrase, 

outer passive without by phrase, outer passive with by phrase, outer passive with wh- 
moved by phrase, outer passive with wh-moved object of by phrase, outer passive without 
by phrase with relative on the subject with and without comp, outer passive with by 
phrase with relative on subject with and without comp. 
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Figure 6.52: Declarative Idiom with V, N, and Prep Anchors Tree: anxOVN!Pnx2 



6.52 Idiom with V, D, N, and Prep anchors: TnxOVDNlPnx2 

Description: This tree family is selected by transitive idioms that are anchored by a verb, 
determiner, noun, and preposition. 17 idioms select this family. 

Examples: make a mess of, keep the lid on 
John made a mess out of his new suit. 

The tabloid didn't keep a lid on the imminent celebrity nuptials. 




Figure 6.53: Declarative Idiom with V, D, N, and Prep Anchors Tree: anxOVDN!Pnx2 



Other available trees: Subject relative clause with and without comp, adjunct relative clause 
with comp/with PP pied-piping, wh-moved subject, imperative, NP gerund, passive with- 
out by phrase, passive with by phrase, passive with wh-moved object of by phrase, passive 
with wh-moved by phrase, 

outer passive without by phrase, outer passive with by phrase, outer passive with wh- 
moved by phrase, outer passive with wh-moved object of by phrase, outer passive without 
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by phrase with relative on the subject with and without comp, outer passive with by 
phrase with relative on subject with and without comp. 



Chapter 7 

Ergatives 



Verbs in English that are termed ergative display the kind of alternation shown in the sentences 
in (7) below. 

(7) The sun melted the ice . 
The ice melted . 

The pattern of ergative pairs as seen in (7) is for the object of the transitive sentence to 
be the subject of the intransitive sentence. The literature discussing such pairs is based largely 
on syntactic models that involve movement, particularly GB. Within that framework two basic 
approaches are discussed: 

• Derived Intransitive 

The intransitive member of the ergative pair is derived through processes of movement 
and deletion from: 



— a transitive D-structure [ Burzio, 198€ ] ; or 



— transitive lexical structure |Hale and Keyser, 198€; Hale and Keyser, 1987 



Pure Intransitive 

The intransitive member is intransitive at all levels of the syntax and the lexicon and is 
not related to the transitive member syntactically or lexically [Napoli, 1988 1. 



The Derived Intransitive approach's notions of movement in the lexicon or in the grammar 
are not represented as such in the XTAG grammar. However, distinctions drawn in these ar- 
guments can be translated to the FB-LTAG framework. In the XTAG grammar the difference 
between these two approaches is not a matter of movement but rather a question of tree fam- 
ily membership. The relation between sentences represented in terms of movement in other 
frameworks is represented in XTAG by membership in the same tree family. Wh-questions and 
their indicative counterparts are one example of this. Adopting the Pure Intransitive approach 



suggested by [ Napoli, 1988 | would mean placing the intransitive ergatives in a tree family with 
other intransitive verbs and separate from the transitive variants of the same verbs. This would 
result in a grammar that represented intransitive ergatives as more closely related to other 
intransitives than to their transitive counterparts. The only hint of the relation between the 
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intransitive ergatives and the transitive ergatives would be that ergative verbs would select 
both tree families. While this is a workable solution, it is an unattractive one for the English 
XTAG grammar because semantic coherence is implicitly associated with tree families in our 
analysis of other constructions. In particular, constancy in thematic role is represented by 
constancy in node names across sentence types within a tree family. For example, if the object 
of a declarative tree is NPi the subject of the passive tree(s) in that family will also be NPi. 

The analysis that has been implemented in the English XTAG grammar is an adaptation 
of the Derived Intransitive approach. The ergative verbs select one family, TnxOVnxl, that 
contains both transitive and intransitive trees. The <trans> feature appears on the intransitive 
ergative trees with the value — and on the transitive trees with the value +. This creates the 
two possibilities needed to account for the data. 

• intransitive ergative/transitive alternation. These verbs have transitive and intran- 
sitive variants as shown in sentences (8) and (9). 

(8) The sun melted the ice cream . 

(9) The ice cream melted . 

In the English XTAG grammar, verbs with this behavior are left unspecified as to value 
for the <trans> feature. This lack of specification allows these verbs to anchor either 
type of tree in the TnxOVnxl tree family because the unspecified <trans> value of the 
verb can unify with either + or — values in the trees. 

• transitive only. Verbs of this type select only the transitive trees and do not allow 
intransitive ergative variants as in the pattern show in sentences (10) and (11). 

(10) Elmo borrowed a book . 

(11) *A book borrowed . 

The restriction to selecting only transitive trees is accomplished by setting the <trans> 
feature value to + for these verbs. 




NP/4 



VP 



V0 [trans : -] 



[] 



Figure 7.1: Ergative Tree: aEnxlV 



The declarative ergative tree is shown in Figure 7A with the <trans> feature displayed. 
Note that the index of the subject NP indicates that it originated as the object of the verb. 



Chapter 8 

Sentential Subjects and Sentential 
Complements 

In the XTAG grammar, arguments of a lexical item, including subjects, appear in the initial tree 
anchored by that lexical item. A sentential argument appears as an S node in the appropriate 
position within an elementary tree anchored by the lexical item that selects it. This is the case 
for sentential complements of verbs, prepositions and nouns and for sentential subjects. The 
distribution of complementizers in English is intertwined with the distribution of embedded 
sentences. A successful analysis of complementizers in English must handle both the cooccur- 
rence restrictions between complementizers and various types of clauses, and the distribution 
of the clauses themselves, in both subject and complement positions. 



8.1 S or VP complements? 

Two comparable grammatical formalisms, Generalized Phrase Structure Grammar (GPSG) 
|Gazdar et al, 1985 | and Head-driven Phrase Structure Grammar (HPSG) |Pollard and Sag"] 



1994|, have rather different treatments of sentential complements (S-comps). They both treat 
embedded sentences as VP's with subjects, which generates the correct structures but misses 
the generalization that S's behave similarly in both matrix and embedded environments, and 
VP's behave quite differently. Neither account has PRO subjects of infinitival clauses- they 
have subjectless VP's instead. GPSG has a complete complementizer system, which appears 
to cover the same range of data as our analysis. It is not clear what sort of complementizer 
analysis could be implemented in HPSG. 

Following standard GB approach, the English XTAG grammar does not allow VP com- 
plements but treats verb-anchored structures without overt subjects as having PRO subjects. 
Thus, indicative clauses, infinitives and gerunds all have a uniform treatment as embedded 
clauses using the same trees under this approach. Furthermore, our analysis is able to preserve 
the selectional and distributional distinction between S's and VP's, in the spirit of GB theories, 
without having to posit 'extra' empty categories^] Consider the alternation between that and 
the null complementizer^], shown in sentences (12) and (13). 



1 i.e. empty complementizers. We do have PRO and NP traces in the grammar. 

2 Although we will continue to refer to 'null' complementizers, in our analysis this is actually the absence of 
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(12) He hopes Muriel wins. 



(13) He hopes that Muriel wins. 

In GB both Muriel wins in (12) and that Muriel wins in (13) are CPs even though there 
is no overt complementizer to head the phrase in (12). Our grammar does not distinguish by 
category label between the phrases that would be labeled in GB as IP and CP. We label both 
of these phrases S. The difference between these two levels is the presence or absence of the 
complementizer (or extracted WH constituent), and is represented in our system as a difference 
in feature values (here, of the <comp> feature), and the presence of the additional structure 
contributed by the complementizer or extracted constituent. This illustrates an important 
distinction in XTAG, that between features and node labels. Because we have a sophisticated 
feature system, we are able to make fine-grained distinctions between nodes with the same label 
which in another system might have to be realized by using distinguishing node labels. 



8.2 Complementizers and Embedded Clauses in English: The 
Data 

Verbs selecting sentential complements (or subjects) place restrictions on their complements, 
in particular, on the form of the embedded verb phrase. | Furthermore, complementizers are 
constrained to appear with certain types of clauses, again, based primarily on the form of the 
embedded VP. For example, hope selects both indicative and infinitival complements. With 
an indicative complement, it may only have that or null as possible complementizers; with an 
infinitival complement, it may only have a null complementizer. Verbs that allow wh+ comple- 
mentizers, such as ask, can take whether and if as complementizers. The possible combinations 



of complementizers and clause types is summarized in Table 8.1 



As can be seen in Table 8.1, sentential subjects differ from sentential complements in re- 
quiring the complementizer that for all indicative and subjunctive clauses. In sentential com- 
plements, that often varies freely with a null complementizer, as illustrated in (14)-(19). 

(14) Christy hopes that Mike wins. 



(15) Christy hopes Mike wins. 



(16) Dania thinks that Newt is a liar. 



(17) Dania thinks Newt is a liar. 



(18) That Helms won so easily annoyed me. 



(19) 



*Helms won so easily annoyed me. 
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Complementizer: 




that 


whether 


if 


for 


null 


Clause type 




indicative 


subject 


Yes 


Yes 


No 


No 


No 




complement 


Yes 


Yes 


Yes 


No 


Yes 


infinitive 


subject 


No 


Yes 


No 


Yes 


Yes 




complement 


No 


Yes 


No 


Yes 


Yes 


subjunctive 


subject 


Yes 


No 


No 


No 


No 




complement 


Yes 


No 


No 


No 


Yes 


gerundive 4 


complement 


No 


No 


No 


No 


Yes 


base 


complement 


No 


No 


No 


No 


Yes 


small clause 


complement 


No 


No 


No 


No 


Yes 



Table 8.1: Summary of Complementizer and Clause Combinations 



Another fact which must be accounted for in the analysis is that in infinitival clauses, 
the complementizer for must appear with an overt subject NP, whereas a complementizer- less 



infinitival clause never has an overt subject, as shown in (20)-(23). (See section S.5 for more 
discussion of the case assignment issues relating to this construction.) 

(20) To lose would be awful. 



(21) For Penn to lose would be awful. 



(22) *For to lose would be awful. 



(23) *Penn to lose would be awful. 

In addition, some verbs select <wh>=+ complements (either questions or clauses with 
whether or if) | Grimshaw, 1990 ]: 

(24) Jesse wondered who left. 



(25) Jesse wondered if Barry left. 



(26) Jesse wondered whether to leave, 
a complementizer. 

3 Other considerations, such as the relationship between the tense/aspect of the matrix clause and the 
tense/aspect of a complement clause are also important but are not currently addressed in the current English 
XTAG grammar. 

4 Most gerundive phrases are treated as NP's. In fact, all gerundive subjects are treated as NP's, and the only 
gerundive complements which receive a sentential parse are those for which there is no corresponding NP parse. 
This was done to reduce duplication of parses. See Chapter llTI for further discussion of gerunds. 
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(27) Jesse wondered whether Barry left. 



(28) *Jesse thought who left. 



(29) *Jesse thought if Barry left. 



(30) *Jesse thought whether to leave. 



(31) *Jesse thought whether Barry left. 



8.3 Features Required 

As we have seen above, clauses may be <wh>=+ or <wh>=— , may have one of several com- 
plementizers or no complementizer, and can be of various clause types. The XTAG analysis 
uses three features to capture these possibilities: <comp> for the variation in complemen- 
tizers, <wh> for the question vs. non-question alternation and <mode>0 for clause types. 
In addition to these three features, the <assign-comp> feature represents complementizer 
requirements of the embedded verb. More detailed discussion of the <assign-comp> feature 
appears below in the discussions of sentential subjects and of infinitives. The four features and 
their possible values are shown in Table |S,2j . 



Feature 


Values 


<comp> 


that, if, whether, for, rel, nil 


<mode> 


ind, inf, subjnt, ger, base, ppart, nom/prep 


<assign-comp> 


that, if, whether, for, rel, ind_nil, inf nil 


<wh> 


+ " 



Table 8.2: Summary of Relevant Features 



8.4 Distribution of Complementizers 



Like other non-arguments, complementizers anchor an auxiliary tree (shown in Figure 3.1 ) 
and adjoin to elementary clausal trees. The auxiliary tree for complementizers is the only 
alternative to having a complementizer position 'built into' every sentential tree. The latter 
choice would mean having an empty complementizer substitute into every matrix sentence and 
a complementizerless embedded sentence to fill the substitution node. Our choice follows the 
XTAG principle that initial trees consist only of the arguments of the anchoi^j - the S tree does 
not contain a slot for a complementizer, and the /3COMP tree has only one argument, an S 
with particular features determined by the complementizer. Complementizers select the type 
of clause to which they adjoin through constraints on the <mode> feature of the S foot node 
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sj] 



comp : <1> [] 
inv : - 

displ-const : <2> [ ] 

wh : <3> [] 

|motle : <4> ind/sbjnc^ 



Comp 




inv : - 
NA assign-comp : <1> 
wh : - 
^node : <4> | 
comp : nil 
sub-conj : nil 
displ-const : <2> 
assign-comp : that 



Figure 8.1: Tree /?COMPs, anchored by that 



in the tree shown in Figure 8.1. These features also pass up to the root node, so that they are 
'visible' to the tree where the embedded sentence adjoins /substitutes. 

The grammar handles the following complementizers: that, whether, if, for, and no comple- 
mentizer, and the clause types: indicative, infinitival, gerundive, past participial, subjunctive 
and small clause (nom/prep). The <comp> feature in a clausal tree reflects the value of the 
complementizer if one has adjoined to the clause. 

The <comp> and <wh> features receive their root node values from the particular com- 
plementizer which anchors the tree. The /3COMPs tree adjoins to an S node with the feature 
<comp>=nil; this feature indicates that the tree does not already have a complementizer 
adjoined to it|] We ensure that there are no stacked complementizers by requiring the foot 
node of /3COMPs to have <comp>=nil. 



8.5 Case assignment, for and the two to's 

The <assign-comp> feature is used to represent the requirements of particular types of clauses 
for particular complementizers. So while the <comp> feature represents constraints originat- 
ing from the VP dominating the clause, the <assign-comp> feature represents constraints 
originating from the highest VP in the clause. <assign-comp> is used to control the 

the appearance of subjects in infinitival clauses (see discussion of ECM constructions in 
8.6. 1| ), to block bare indicative sentential subjects (bare infinitival subjects are allowed), and 

5 <mode> actu ally conflates several types of information, in particular verb form and mood. 
6 See section pi~^ for a discussion of the difference between complements and adjuncts in the XTAG grammar. 
7 Because root S's cannot have complementizers, the parser checks that the root S has <comp>=nil at the 
end of the derivation, when the S is also checked for a tensed verb. 
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to block 'that-trace' violations. 

Examples (33), (34) and (35) show that an accusative case subject is obligatory in an 
infinitive clause if the complementizer for is present. The infinitive clauses in (32) is analyzed 
in the English XTAG grammar as having a PRO subject. 

(32) Christy wants to pass the exam. 



(33) Mike wants for her to pass the exam. 



(34) *Mike wants for she to pass the exam. 



(35) *Christy wants for to pass the exam. 

The for-to construction is particularly illustrative of the difficulties and benefits faced in 
using a lexicalized grammar. It is commonly accepted that for behaves as a case-assigning 
complementizer in this construction, assigning accusative case to the 'subject' of the clause 
since the infinitival verb does not assign case to its subject position. However, in our featurized 
grammar, the absence of a feature licenses anything, so we must have overt null case assigned 



by infinitives to ensure the correct distribution of PRO subjects. (See section 4.4 for more dis- 
cussion of case assignment.) This null case assignment clashes with accusative case assignment 
if we simply add for as a standard complementizer, since NP's (including PRO) are drawn from 
the lexicon already marked for case. Thus, we must use the <assign-comp> feature to pass 
information about the verb up to the root of the embedded sentence. To capture these facts, 
two infinitive to's are posited. One infinitive to has <assign-case>=none which forces a PRO 
subject, and <assign-comp>=inf_nil which prevents for from adjoining. The other infinitive 
to has no value at all for <assign-case> and has <assign-comp>=for/ecm, so that it can 
only occur either with the complementizer for or with ECM constructions. In those instances 
either for ox the ECM verb supplies the <assign-case> value, assigning accusative case to the 
overt subject. 



8.6 Sentential Complements of Verbs 

Tree families: TnxOVsl, TnxOVnxls2, TItVnxls2, TItVpnxls2, TItVadls2. 

Verbs that select sentential complements restrict the <mode> and <comp> values for 
those complements. Since with very few exceptions^] long distance extraction is possible from 
sentential complements, the S complement nodes are adjunction nodes. Figure |8.2| shows the 
declarative tree for sentential complements, anchored by think. 

The need for an adjunction node rather than a substitution node at Si may not be obvious 
until one considers the derivation of sentences with long distance extractions. For example, the 
declarative in (36) is derived by adjoining the tree in Figure |8.3| (b) to the Si node of the tree in 
Figure |8^(a). Since there are no bottom features on Si, the same final result could have been 
achieved with a substitution node at Si. 

(36) The emu thinks that the aardvark smells terrible. 
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NP i VP 



V S;* 



think 



Figure 8.2: Sentential complement tree: /mxOVsl 




DetP N 



D aardvark smells terrible 




DetP N 



D emu thinks 



the the 

(a) (b) 

Figure 8.3: Trees for The emu thinks that the aardvark smells terrible. 

However, adjunction is crucial in deriving sentences with long distance extraction, as in 
sentences (37) and (38). 

(37) Who does the emu think smells terrible? 

(38) Who did the elephant think the panda heard the emu say smells terrible? 



The example in (37) is derived from the trees for who smells terrible? shown in Figure 8.4 
and the emu thinks S shown in Figure |8.3| (b), by adjoining the latter at the S r node of the 
former |] This process is recursive, allowing sentences like (38). Such a representation has been 
shown by fKroch and Joshi, 1985 ] to be well-suited for describing unbounded dependencies. 

In English, a complementizer may not appear on a complement with an extracted subject 
(the 'that-trace' configuration). This phenomenon is illustrated in (39)-(41): 

(39) Which animal did the giraffe say that he likes? 



For example, long distance extraction is not possible from the S complement in it-clefts. 
9 See Chapter ^ for a discussion of do-support. 
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Figure 8.4: Tree for Who smells terrible? 



(40) *Which animal did the giraffe say that likes him? 



(41) Which animal did the giraffe say likes him? 



These sentences are derived in XTAG by adjoining the tree for did the giraffe say S at 
the S r node of the tree for either which animal likes him (to yield sentence (41)) or which 
animal he likes (to yield sentence (39)). That-trace violations are blocked by the presence 
of the feature <assign-comp>=inf_nil/ind_nil/ecm feature on the bottom of the S r node 
of trees with extracted subjects (WO), i.e. those used in sentences such as (40) and (41). 
If a complementizer tree, /3COMPs, adjoins to a subject extraction tree at S r , its <assign- 
comp> = that /whether /for/if feature will clash and the derivation will fail. If there is no 
complementizer, there is no feature clash, and this will permit the derivation of sentences like 
(41), or of ECM constructions, in which case the ECM verb will have <assign-comp>=ecm 
(see section 8.6.1 for more discussion of the ECM case). Complementizers may adjoin normally 
to object extraction trees such as those used in sentence (39), and so object extraction trees 
have no value for the <assign-comp> feature. 

In the case of indirect questions, subjacency follows from the principle that a given tree 
cannot contain more than one wh-element. Extraction out of an indirect question is ruled out 
because a sentence like: 



(42) * Whoj do you wonder who.,- e-,- loves ej ? 

would have to be derived from the adjunction of do you wonder into whoi whoj ej loves ej, 
which is an ill-formed elementary tree.0 

10 This does not mean that elementary trees with more than one gap should be ruled out across the grammar. 
Such trees might be required for dealing with parasitic gaps or gaps in coordinated structures. 



8.6. SENTENTIAL COMPLEMENTS OF VERBS 



89 



8.6.1 Exceptional Case Marking Verbs 



Tree family: TXnxOVsl Exceptional Case Marking verbs are those which assign accusative 
case to the subject of the sentential complement. This is in contrast to verbs in the TnxOVnxls2 
family (section |6.6| ) , which assign accusative case to an NP which is not part of the sentential 
complement. 

The subject of an ECM infinitive complement is assigned accusative case is a manner anal- 



ogous to that of a subject in a for-to construction, as described in section |8.5| . As in the for-to 
case, the ECM verb assigns accusative case into the subject of the lower infinitive, and so the in- 
finitive uses the to which has no value for <assign-case> and has <assign-comp>=for/ecm. 
The ECM verb has <assign-comp>=ecm and <assign-case>=acc on its foot. The former 
allows the <assign-comp> features of the ECM verb and the to tree to unify, and so be used 
together, and the latter assigns the accusative case to the lower subject. 

Figure |8.5| shows the declarative tree for the tree for the TXnxOVsl family, in this case 



anchored by expects. Figure 3.6 shows a parse for Van expects Bob to talk 




NPj-L 



expects 



displ-const : \petl : 



assign-comp : ecm 

inv : - 

extracted : - 

control : <2> 

punct : [contains : <10^ 

comp : nil 

mode : inf 

wh: - 



Figure 8.5: ECM tree: /3XnxOVsl 



The ECM and for-to cases are analogous in how they are used together with the correct 
infinitival to to assign accusative case to the subject of the lower infinitive. However, they are 
different in that for is blocked along with other complementizers in subject extraction contexts, 



as discussed in section |8.6| , as in (43), while subject extraction is compatible with ECM cases, 
as in (44). 



(43) *What child did the giraffe ask for to leave? 



(44) Who did Bill expect to eat beans? 
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Figure 8.6: Sample ECM parse 



Sentence (43) is ruled out by the <assign-comp>= inf_nil/ind_nil/ecm feature on the 
subject extraction tree for ask, since the <assign-comp>=for feature from the for tree will 
fail to unify. However, (44) will be allowed since <assign-comp>=ecm feature on the expect 
tree will unify with the foot of the ECM verb tree. The use of features allows the ECM and for- 
to constructions to act the same for exceptional case assignment, while also being distinguished 
for that-trace violations. 

Verbs that take bare infinitives, as in (45), are also treated as ECM verbs, the only difference 
being that their foot feature has <mode>=base instead of <mode>=inf. Since the comple- 
ment does not have to, there is no question of using the to tree for allowing accusative case to be 
assigned. Instead, verbs with <mode>=base allow either accusative or nominative case to be 
assigned to the subject, and the foot of the ECM bare infinitive tree forces it to be accusative 
by its < assign-case >=acc value at its foot node unifies with the <assign-case>=nom/acc 
value of the bare infinitive clause. 



(45) Bob sees the harmonica fall. 



The trees in the TXnxOVsl family are generally parallel to those in the TnxOVsl family, 
except for the <assign-case> and <assign-comp> values on the foot nodes. However, the 
TXnxOVsl family also includes a tree for the passive, which of course is not included in the 
TnxOVsl family. Unlike all the other trees in the TXnxOVsl family, the passive tree is not rooted 
in S, and is instead a VP auxiliary tree. Since the subject of the infinitive is not thematically 
selected by the ECM verb, it is not part of the ECM verb's tree, and so it cannot be part of 
the passive tree. Therefore, the passive acts as a raising verb (see section |9.3| ). For example, 
to derive (47), the tree in Figure 8.7 would adjoin into a derivation for Bob to talk at the VP 
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node (and the <mode>=passive feature, not shown, forces the auxiliary to adjoin in, as for 
other passives, as described in chapter [l^). 

(46) Van expects Bob to talk. 

(47) Bob was expected to talk. 



VI? 




V VP 



expected 

Figure 8.7: ECM passive 

It has long been noted that passives of both full and bare infinitive ECM constructions are 
full infinitives, as in (47) and (49). 

(48) Bob sees the harmonica fall. 

(49) The harmonica was seen to fall. 

(50) *The harmonica was seen fall. 

Under the TAG ECM analysis, this fact is easy to implement. The foot node of the ECM 
passive tree is simply set to have <mode>=inf, which prevents the derivation of (50). There- 
fore, for all the other trees in the family, to foot nodes are set to have <mode>=base or 
<mode>=inf depending on whether it is a bare infinitive or not. These foot nodes are all S 
nodes. The VP foot node of the passive tree, however, has <mode>=inf regardless. 

8.7 Sentential Subjects 

Tree families: TsOVnxl, TsOAxl, TsONl, TsOPnxl, TsOARBPnxl, TsOPPnxl, TsOPNaPnxl, 
TsOV, TsOVtonxl, TsONPnxl, TsOAPnxl, TsOAlsl. 

Verbs that select sentential subjects anchor trees that have an S node in the subject position 
rather than an NP node. Since extraction is not possible from sentential subjects, they are 
implemented as substitution nodes in the English XTAG grammar. Restrictions on sentential 
subjects, such as the required that complementizer for indicatives, are enforced by feature values 
specified on the S substitution node in the elementary tree. 
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Sentential subjects behave essentially like sentential complements, with a few exceptions. 
In general, all verbs which license sentential subjects license the same set of clause types. Thus, 
unlike sentential complement verbs which select particular complementizers and clause types, 
the matrix verbs licensing sentential subjects merely license the S argument. Information about 
the complementizer or embedded verb is located in the tree features, rather than in the features 
of each verb selecting that tree. Thus, all sentential subject trees have the same <mode>, 
<comp> and <assign-comp> values shown in Figure [S^§|(a). 



NPni 



VP 



Soi 




Si* 



extracted : - 
inv : - 

assign-comp : inf nil 
comp : that/for/whether/ni( 
mode : inf/ind 



NA 




displ-const : [ se tl : -] 
assign-comp : inf_nil/ind_ni 
inv : - 
control : <6> 
punct : [contains : <14>] 
comp : that/nil 
mode : hid 
wh : - 



[] 



NPil 



perplexes 



thinks 



(a) 



(b) 



Figure 8.8: Comparison of <assign-comp> values for sentential subjects: asOVnxl (a) and 
sentential complements: /JnxOVsl (b) 



The major difference in clause types licensed by S-subjs and S-comps is that indicative S- 



subjs obligatorily have a complementizer (see examples in section 3.2). The < assign-comp > 
feature is used here to license a null complementizer for infinitival but not indicative clauses. 
<assign-comp> has the same possible values as <comp>, with the exception that the nil 
value is 'split' into ind_nil and inf_nil. This difference in feature values is illustrated in 
Figure |8.8| . 

Another minor difference is that whether but not if is grammatical with S-subjsJ 1 " 1 ] Thus, if 
is not among the <comp> values allowed in S-subjs. The final difference from S-comps is that 
there are no S-subjs with <mode>=ger. As noted in footnote || of this chapter, gerundive 
complements are only allowed when there is no corresponding NP parse. In the case of gerundive 
S-subjs, there is always an NP parse available. 



8.8 Nouns and Prepositions taking Sentential Complements 

Trees: aNXNs, /3vxPs, /3Pss, /3nxPs, TnxONlsl, TnxOAlsl. 

Prepositions and nouns can also select sentential complements, using the trees listed above. 
These trees use the <mode> and <comp> features as shown in Figure 8.9. For example, the 



Some speakers also find if as a complementizer only marginally grammatical in S-comps. 
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S; Sj* 

NA NA 



NP r 
Si 





PO 



SI rel-pron : <9> 

punct : [struct : nil] 
displ-const : [setl : -] 
inv : - 
extracted : - 
wh : - 



NO 



mode : inf/ind 
comp : that/nil 
inv : - 
extracted : - 



(a) 



(b) 



Figure 8.9: Sample trees for preposition: /3Pss (a) and noun: qNXNs (b) taking sentential 
complements 

noun claim takes only indicative complements with that, while the preposition with takes small 
clause complements, as seen in sentences (51)-(54). 

(51) Beth's claim that Clove was a smart dog.... 

(52) *Beth's claim that Clove a smart dog.... 

(53) Dania wasn't getting any sleep with Doug sick. 

(54) *Dania wasn't getting any sleep with Doug was sick. 



In the literature on control, two types are often distinguished: obligatory control, as in sen- 
tences (55), (56), (57), and (58) and optional control, as in sentence (59). 

(55) Srinij promised Mickey j [PROj to leave]. 

(56) Srini persuaded Mickey j [PRO, to leave]. 

(57) Srinij wanted [PROj to leave]. 



8.9 PRO control 



8.9.1 Types of control 



(58) Christy j left the party early [PROj to go to the airport]. 
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(59) [PRO ar &/j to dance] is important for Bill.;. 

At present, an analysis for obligatory control into complement clauses (as in sentences (55), 
(56), and (57)) has been implemented. An analysis for cases of obligatory control into adjunct 



clauses and optional control exists and can be found in [Bhatt, 1994]. 



8.9.2 A feature-based analysis of PRO control 

The analysis for obligatory control involves co-indexation of the control feature of the NP 
anchored by PRO to the control feature of the controller. A feature equation in the tree 
anchored by the control verb co-indexes the control feature of the controlling NP with the foot 
node of the tree. All sentential trees have a co-indexed control feature from the root S to the 
subject NP. 

When the tree containing the controller adjoins onto the complement clause tree containing 
the PRO, the features of the foot node of the auxiliary tree are unified with the bottom features 
of the root node of the complement clause tree containing the PRO. This leads to the control 
feature of the controller being co- indexed with the control feature of the PRO. 

Depending on the choice of the controlling verb, the control propagation paths in the auxil- 
iary trees are different. In the case of subject control (as in sentence (56)), the subject NP and 
the foot node are have co-indexed control features, while for object control (e.g. sentence (55), 
the object NP and the foot node are co-indexed for control. Among verbs that belong to the 
TnxOVnxls2 family, i.e. verbs that take an NP object and a clausal complement, subject-control 
verbs form a distinct minority, promise being the only commonly used verb in this class. 

Consider the derivation of sentence (56). The auxiliary tree for persuade, shown in Figure 
|8.10| , has the following feature equation (60). 

(60) NPi:<control> = S 2 t:<control> 



The auxiliary tree adjoins into the tree for leave, shown in Figure |8.11| , which has the following 
feature equation (61). 

(61) S r .b:<control> = NP .t:<control> 

Since the adjunction takes place at the root node (S r ) of the leave tree, after unification, NPi 
of the persuade tree and NPq of the leave tree share a control feature. The resulting derived 



and derivation trees are shown in Figures 8.12 and 3.13 



8.9.3 The nature of the control feature 

The control feature does not have any value and is used only for co-indexing purposes. If two 
NPs have their control features co-indexed, it means that they are participating in a relationship 
of control; the c-commanding NP controls the c-commanded NP. 

8.9.4 Long-distance transmission of control features 

Cases involving embedded infinitival complements with PRO subjects such as (62) can also be 
handled. 
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s r 




V NP;i [control : <1>[] S 2 * [control : <1>] 

NA[] 

persuaded 

Figure 8.10: Tree for persuaded 
s r [] 

[control : <1>] 




NP i [control : <1> []] VP 

V 
leave 

Figure 8.11: Tree for leave 

(62) Johnj wants [PROj to want [PROj to dance]]. 

The control feature of 'John' and the two PRO's all get co-indexed. This treatment might 
appear to lead to a problem. Consider (63): 

(63) John*j wants [Maryj to want [PROj to dance]]. 

If both the 'want' trees have the control feature of their subject co-indexed to their foot 
nodes, we would have a situtation where the PRO is co-indexed for control feature with 'John', 
as well as with 'Mary'. Note that the higher 'want' in (62) is wantECM - it assigns case to the 
subject of the lower clause while the lower 'want' in (62) is not. Subject control is restricted to 
non-ECM (Exceptional Case Marking) verbs that take infinitival complements. Since the two 
'want's in (62) are different with respect to their control (and other) properties, the control 
feature of PRO stops at 'Mary' and is not transmitted to the higher clause. 

8.9.5 Locality constraints on control 

PRO control obeys locality constraints. The controller for PRO has to be in the immediately 
higher clause. Consider the ungrammatical sentence (64) ((64) is ungrammatical only with the 
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N V NP [control :<1>[]] S 2 [control : <1>] 

NA 




Srini persuaded N NP [control : <1>] VP r 




Mickey PRO V VP 

NA 

to V 
leave 

Figure 8.12: Derived tree for Srini persuaded Mickey to leave 



ocnxOVfleave] 




(3nxOVnxls2[persuaded] (0) ocNX[PRO] (1) pVvx[to] (2) 



ocNXN[Srini] (1) ocNXN [Mickey] (2.2) 

Figure 8.13: Derivation tree for Srini persuaded Mickey to leave 
co- indexing indicated below). 

(64) * Johnj wants [PROj to persuade Maryj [PROj to dance]] 

However, such a derivation is ruled out automatically by the mechanisms of a TAG derivation 
and feature unification. Suppose it was possible to first compose the want tree with the dance 
tree and then insert the persuade tree. (This is not possible in the XTAG grammar because of 
the convention that auxiliary trees have NA (Null Adjunction) constraints on their foot nodes.) 
Even then, at the end of the derivation the control feature of the subject of want would end 
up co-indexed with the PRO subject of persuade and the control feature of Mary would be 
co-indexed with the PRO subject of dance as desired. There is no way to generate the illegal 
co- indexing in (63). Thus the locality constraints on PRO control fall out from the mechanics 
of TAG derivation and feature unification. 
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8.10 Reported speech 

Reported speech is handled in the XTAG grammar by having the reporting clause adjoin into 
the quote. Thus, the reporting clause is an auxiliary tree, anchored by the reporting verb. See 



[ Doran, 1998 ] for details of the analysis. There are trees in both the TnxOVsl and Tnx0nxls2 



families to handle reporting clauses which precede, follow and come in the middle of the quote. 



Chapter 9 



The English Copula, Raising Verbs, 
and Small Clauses 

The English copula, raising verbs, and small clauses are all handled in XTAG by a common 
analysis based on sentential clauses headed by non-verbal elements. Since there are a number of 
different analyses in the literature of how these phenomena are related (or not), we will present 
first the data for all three phenomena, then various analyses from the literature, finishing with 
the analysis used in the English XTAG grammar .[] 

9.1 Usages of the copula, raising verbs, and small clauses 
9.1.1 Copula 

The verb be as used in sentences (65)- (67) is often referred to as the copula. It can be followed 
by a noun, adjective, or prepositional phrase. 

(65) Carl is a jerk . 



(66) Carl is upset . 



(67) Carl is in a foul mood . 

Although the copula may look like a main verb at first glance, its syntactic behavior follows 
the auxiliary verbs rather than main verbs. In particular, 

• Copula be inverts with the subject. 

(68) is Beth writing her dissertation ? 
is Beth upset ? 

*wrote Beth her dissertation ? 



lr This chapter is strongly based on [Heycock, 1991 . Sec tions 9.1 and|9.2| are greatly condensed from her paper, 



while the description of the XTAG analysis in section |9.3| is an updated and expanded version. 
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• Copula be occurs to the left of the negative marker not. 

(69) Beth is not writing her dissertation . 
Beth is not upset . 

*Beth wrote not her dissertation . 

• Copula be can contract with the negative marker not. 

(70) Beth isn't writing her dissertation . 
Beth isn't upset . 

*Beth wroten't her dissertation . 

• Copula be can contract with pronominal subjects. 

(71) She's writing her dissertation . 
She's upset . 

*She'ote her dissertation . 

• Copula be occurs to the left of adverbs in the unmarked order. 

(72) Beth is often writing her dissertation . 
Beth is often upset . 

*Beth wrote often her dissertation . 

Unlike all the other auxiliaries, however, copula be is not followed by a verbal category (by 
definition) and therefore must be the rightmost verb. In this respect, it is like a main verb. 

The semantic behavior of the copula is also unlike main verbs. In particular, any semantic 
restrictions or roles placed on the subject come from the complement phrase (NP, AP, PP) rather 
than from the verb, as illustrated in sentences (73) and (74). Because the complement phrases 
predicate over the subject, these types of sentences are often called predicative sentences. 

(73) The bartender was garrulous . 

(74) ?The cliff was garrulous . 
9.1.2 Raising Verbs 

Raising verbs are the class of verbs that share with the copula the property that the complement, 
rather than the verb, places semantic constraints on the subject. 

(75) Carl seems a jerk . 
Carl seems upset . 

Carl seems in a foul mood . 

(76) Carl appears a jerk . 
Carl appears upset . 

Carl appears in a foul mood . 
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The raising verbs are similar to auxiliaries in that they order with other verbs, but they 
are unique in that they can appear to the left of the infinitive, as seen in the sentences in (77). 
They cannot, however, invert or contract like other auxiliaries (78), and they appear to the 
right of adverbs (79). 

(77) Carl seems to be a jerk . 
Carl seems to be upset . 

Carl seems to be in a foul mood . 

(78) *seems Carl to be a jerk ? 
*Carl seemn't to be upset . 
*Carl'ems to be in a foul mood . 

(79) Carl often seems to be upset . 
*Carl seems often to be upset . 

9.1.3 Small Clauses 

One way of describing small clauses is as predicative sentences without the copula. Since matrix 
clauses require tense, these clausal structures appear only as embedded sentences. They occur 
as complements of certain verbs, each of which may allow certain types of small clauses but not 
others, depending on its lexical idiosyncrasies. 

(80) I consider [Carl a jerk] . 
I consider [Carl upset] . 

?I consider [Carl in a foul mood] . 

(81) I prefer [Carl in a foul mood] . 
??I prefer [Carl upset] . 

9.1.4 Raising Adjectives 

Raising adjectives are the class of adjectives that share with the copula and raising verbs the 
property that the complement, rather than the verb, places semantic constraints on the subject. 

They appear with the copula in a matrix clause, as in (82). However, in other cases, such 
as that of small clauses (83) , they do not have to appear with the copula. 

(82) Carl is likely to be a jerk . 
Carl is likely to be upset . 

Carl is likely to be in a foul mood . 
Carl is likely to perjure himself . 



(83) I consider Carl likely to perjure himself . 



9.2. VARIOUS ANALYSES 
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9.2 Various Analyses 

9.2.1 Main Verb Raising to INFL + Small Clause 



In [ Pollock, 198£ | the copula is generated as the head of a VP, like any main verb such as 
sing or buy. Unlike all other main verbs^, however, be moves out of the VP and into Inn in a 
tensed sentence. This analysis aims to account for the behavior of be as an auxiliary in terms 
of inversion, negative placement and adverb placement, while retaining a sentential structure 
in which be heads the main VP at D-Structure and can thus be the only verb in the clause. 

Pollock claims that the predicative phrase is not an argument of be, which instead he assumes 
to take a small clause complement, consisting of a node dominating an NP and a predicative 
AP, NP or PP. The subject NP of the small clause then raises to become the subject of the 
sentence. This accounts for the failure of the copula to impose any selectional restrictions on 
the subject. Raising verbs such as seem and appear, presumably, take the same type of small 
clause complement. 

9.2.2 Auxiliary + Null Copula 



In [ Lapointe, 1980| the copula is treated as an auxiliary verb that takes as its complement a VP 



headed by a passive verb, a present participle, or a null verb (the true copula). This verb may 
then take AP, NP or PP complements. The author points out that there are many languages 
that have been analyzed as having a null copula, but that English has the peculiarity that its 
null copula requires the co-presence of the auxiliary be. 

9.2.3 Auxiliary + Predicative Phrase 



In GPSG ( fGazdar et al, 1985| , [gag et al, 1985] ) the copula is treated as an auxiliary verb 



that takes an X 2 category with a + value for the head feature [PRD] (predicative). AP, NP, PP 
and VP can all be [+PRD], but a Feature Co-occurrence Restriction guarantees that a [+PRD] 
VP will be headed by a verb that is either passive or a present participle. 

GPSG follows [ Chomsky, 1970| ] in adopting the binary valued features [V] and [N] for de- 



composing the verb, noun, adjective and preposition categories. In that analysis, verbs are 
[+V,-N], nouns are [-V,+N], adjectives [+V,+N] and prepositions [-V,-N[. NP and AP 
predicative complements generally pattern together; a fact that can be stated economically 



using this category decomposition. In neither |Sag et al., 1985 nor [Chomsky, 1970| is there 



any discussion of how to handle the complete range of complements to a verb like seem, which 
takes AP, NP and PP complements, as well as infinitives. The solution would appear to be to 
associate the verb with two sets of rules for small clauses, leaving aside the use of the verb with 
an expletive subject and sentential complement. 

9.2.4 Auxiliary + Small Clause 



In |Moro, 1990| the copul a is treated as a special functional category - a lexicalization of tense, 



which is considered to head its own projection. It takes as a complement the projection of 
another functional category, Agr (agreement). This projection corresponds roughly to a small 



2 with the exception of have in British English. See footnote ^ in Chapter 
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clause, and is considered to be the domain within which predication takes place. An NP must 
then raise out of this projection to become the subject of the sentence: it may be the subject 
of the AgrP, or, if the predicate of the AgrP is an NP, this may raise instead. In addition 
to occurring as the complement of be, AgrP is selected by certain verbs such as consider. It 
follows from this analysis that when the complement to consider is a simple AgrP, it will always 
consist of a subject followed by a predicate, whereas if the complement contains the verb be, 
the predicate of the AgrP may raise to the left of be, leaving the subject of the AgrP to the 
right. 

(84) Johnj is [AgrP U the culprit ] . 

(85) The culpritj is [AgrP John t; L ] . 

(86) I consider [A gr p John the culprit] . 

(87) I consider [Johnj to be [AgrP U the culprit ]] . 

(88) I consider [the culpritj to be [AgrP John t; L ]] . 

Moro does not discuss a number of aspects of his analysis, including the nature of Agr and 
the implied existence of sentences without VP's. 

9.3 XTAG analysis 




V NP 7 V AP 7 V PP 7 

e NO e AO e P0 

(a) (b) (c) 

Figure 9.1: Predicative trees: anxONl (a), anxOAxl (b) and cmxOPnxl (c) 

The XTAG grammar provides a uniform analysis for the copula, raising verbs and small 
clauses by treating the maximal projections of lexical items that can be predicated as predicative 
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clauses, rather than simply noun, adjective and prepositional phrases. The copula adjoins in 
for matrix clauses, as do the raising verbs. Certain other verbs (such as consider) can take the 
predicative clause as a complement, without the adjunction of the copula, to form the embedded 
small clause. 

The structure of a predicative clause, then, is roughly as seen in (89)-(91) for NP's, AP's 
and PP's. The XTAG trees corresponding to these structures^] are shown in Figures |9.1| (a), 
l](b) , and |9.1| (c), respectively. 



s NP [ VP N ...]] 

(90) [ s NP [ VP A...]] 

(91) [ s NP [ VP P ...}] 

The copula be and raising verbs all get the basic auxiliary tree as explained in the section 



on auxiliary verbs (section |20.1 ), Unlike the raising verbs, the copula also selects the inverted 
auxiliary tree set. Figure |9.2| shows the basic auxiliary tree anchored by the copula be. The 
<mode> feature is used to distinguish the predicative constructions so that only the copula 
and raising verbs adjoin onto the predicative trees. 

There are two possible values of <mode> that correspond to the predicative trees, nom 
and prep. They correspond to a modified version of the four-valued [N,V] feature described 
in section |9.2.3j . The nom value corresponds to [N+], selecting the NP and AP predicative 
clauses. As mentioned earlier, they often pattern together with respect to constructions using 
predicative clauses. The remaining prepositional phrase predicative clauses, then, correspond 
to the prep mode. 



Figure 9.3 shows the predicative adjective tree from Figure |9,l| (b) now anchored by upset and 
with the features visible. As mentioned, <mode>=nom on the VP node prevents auxiliaries 
other than the copula or raising verbs from adjoining into this tree. In addition, it prevents the 
predicative tree from occurring as a matrix clause. Since all matrix clauses in XTAG must be 
mode indicative (ind) or imperative (imp), a tree with <mode>=nom or <mode>=prep 
must have an auxiliary verb (the copula or a raising verb) adjoin in to make it <mode>=ind. 

The distribution of small clauses as embedded complements to some verbs is also man- 
aged through the mode feature. Verbs such as consider and prefer select trees that take 
a sentential complement, and then restrict that complement to be <mode>=nom and/or 
<mode>=prep, depending on the lexical idiosyncrasies of that particular verb. Many verbs 
that don't take small clause complements do take sentential complements that are <mode>=ind, 
which includes small clauses with the copula already adjoined. Hence, as seen in sentence sets 
(92)-(94), consider takes only small clause complements, prefer takes both prep (but not nom) 
small clauses and indicative clauses, while feel takes only indicative clauses. 

(92) She considers Carl a jerk . 

?She considers Carl in a foul mood . 
*She considers that Carl is a jerk . 

3 There are actually two other predicative trees in the XTAG grammar. Another predicative noun phrase tree 
is needed for noun phrases without determiners, as in the sentence They are firemen, and another prepositional 
phrase tree is needed for exhaustive prepositional phrases, such as The workers are below. 
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vp,.[] 

conditional : <9> [] 
perfect : <10> [] 
progressive : <11> [] 
displ-const : [ se tl : <8>] 
assign-case : <7> 
[mode : <6> ] 
tense : <5> 
agr : <3> 
neg : <2> 

assign-comp : <1> 
mainv : <4> 




V assign-comp : <1> [] 
neg :<2>[] 
agr :<3>[] 
mainv : <4> - 
tense : <5> [ ] 
|mode : <6> | ] 
assign-case : <7> [ ] 
displ-const : [ se tl : <8> []] 
[mode : ind| 
tense : pres 
mainv : - 

assign-comp : ind_nil/adj/that/rel/if/whether 
assign-case : nom 



displ-const : [ se tl ; 
progressive : <11> 
perfect : <10> 
conditional : <9> 



Jmode : nom/prep]| 



agr : 



3rdsing : + 
num : sing 
pers : 3 



Figure 9.2: Copula auxiliary tree: /?Vvx 

(93) *She prefers Carl a jerk . 

She prefers Carl in a foul mood . 
She prefers that Carl is a jerk . 



(94) *She feels Carl a jerk . 

*She feels Carl in a foul mood . 
She feels that Carl is a jerk . 



Figure 9.4 shows the tree anchored by consider that takes the predicative small clauses. 

Raising verbs such as seems work essentially the same as the auxiliaries, in that they also 
select the basic auxiliary tree, as in Figure [9.2[ . The only difference is that the value of <mode> 
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S r displ-const : [ se tl : -] 

assign-comp : inf_nil/ind_nil 

assign-case : <1> [] 

agr :<2>[] 

tense : <3> [ ] 

comp : nil 

mainv : <4> [ ] 

mode : <5> [] 

displ-const : [ se tl : <6> [] 
assign-comp : <7> [] 
inv : - 
extracted : - 




<1> 



[setl : 
: <7> 
acc 



<6> 



assign-case 
frnode : nom | 
displ-const : [setl 



V[] AP,[] 

[] [] 



A[] 
[wh : 



upset 



Figure 9.3: Predicative AP tree with features: anxOAxl 



on the VP foot node might be different, depending on what types of complements the raising 
verb takes. Also, two of the raising verbs take an additional tree, /JVpxvx, shown in Figure 9.5, 
which allows for an experiencer argument, as in John seems to me to be happy. 



Raising adjectives, such as likely, take the tree shown in Figure |9.6| . This tree combines 
aspects of the auxiliary tree /3Vvx and the adjectival predicative tree shown in Figure |9.1|(b). 
As with /3Vvx, it adjoins in as a VP auxiliary tree. However, since it is anchored by an adjective, 
not a verb, it is similar to the adjectival predicative tree in that it has an e at the V node, and 
a feature value of <mode>=nom which is passed up to the VP root indicates that it is an 
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displ-const : [ se tl : -] 
assign-comp : inf_nil/ind_nil 
inv : - 

assign-case : acc 
comp : nil 



[mode : nom/prep~| 

[] 



consider 



Figure 9.4: Consider tree for embedded small clauses 




Figure 9.5: Raising verb with experiencer tree: /JVpxvx 



adjectival predication. This serves the same purpose as in the case of the tree in Figure |9.3| , 
and forces another auxiliary verb, such as the copula, to adjoin in to make it <mode>=ind. 




e AD VP 



Figure 9.6: Raising adjective tree: /?Vvx-adj 
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9.4 Non-predicative BE 

The examples with the copula that we have given seem to indicate that be is always followed 
by a predicative phrase of some sort. This is not the case, however, as seen in sentences such as 
(95)-(100). The noun phrases in these sentences are not predicative. They do not take raising 
verbs, and they do not occur in embedded small clause constructions. 

(95) my teacher is Mrs. Wayman . 



(96) Doug is the man with the glasses . 

(97) *My teacher seems Mrs. Wayman . 

(98) *Doug appears the man with the glasses . 

(99) *I consider [my teacher Mrs. Wayman] . 

(100) *I prefer [Doug the man with the glasses] . 

In addition, the subject and complement can exchange positions in these type of examples 
but not in sentences with predicative be. Sentence (101) has the same interpretation as sentence 
(96) and differs only in the positions of the subject and complement NP's. Similar sentences, 
with a predicative be, are shown in (102) and (103). In this case, the sentence with the exchanged 
NP's (103) is ungrammatical. 

(101) The man with the glasses is Doug . 



(102) Doug is a programmer . 

(103) *A programmer is Doug . 

The non-predicative be in (95) and (96), also called equative be, patterns differently, both 
syntactically and semantically, from the predicative usage of be. Since these sentences are 
clearly not predicative, it is not desirable to have a tree structure that is anchored by the NP, 
AP, or PP, as we have in the predicative sentences. In addition to the conceptual problem, we 
would also need a mechanism to block raising verbs from adjoining into these sentences (while 
allowing them for true predicative phrases), and prevent these types of sentence from being 
embedded (again, while allowing them for true predicative phrases). 

Although non-predicative be is not a raising verb, it does exhibit the auxiliary verb behavior 
set out in section |9.1.1 . It inverts, contracts, and so forth, as seen in sentences (104) and (105), 



and therefore can not be associated with any existing tree family for main verbs. It requires 
a separate tree family that includes the tree for inversion. Figures |9.7| (a) and |9.7| (b) show the 
declarative and inverted trees, respectively, for equative be. 

(104) is my teacher Mrs. Wayman ? 



(105) Doug isn't the man with the glasses 



CHAPTER 9. THE ENGLISH COPULA, RAISING VERBS, AND SMALL CLAUSES 
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(a) (b) 
Figure 9.7: Equative BE trees: anxOBEnxl (a) and alnvnxOBEnxl (b) 



Chapter 10 



Ditransitive constructions and 
dative shift 

Verbs such as give and put that require two objects, as shown in examples (106)-(109), are 
termed ditransitive. 

(106) Christy gave a cannoli to Beth Ann . 

(107) *Christy gave Beth Ann . 

(108) Christy put a cannoli in the refrigerator . 

(109) *Christy put a cannoli . 

The indirect objects Beth Ann and refrigerator appear in these examples in the form of 
PP's. Within the set of ditransitive verbs there is a subset that also allow two NP's as in (110). 
As can be seen from (110) and (111) this two NP, or double-object, construction is grammatical 
for give but not for put. 

(110) Christy gave Beth Ann a cannoli . 



(Ill) * Christy put the refrigerator the cannoli . 

The alternation between (106) and (110) is known as dative shift ^] In order to account for 
verbs with dative shift the English XTAG grammar includes structures for both variants in the 
tree family TnxOVnxlPnx2. The declarative trees for the shifted and non-shifted alternations 



are shown in Figure |10.1 



The indexing of nodes in these two trees represents the fact that the semantic role of 
the indirect object (NP2) in Figure 10.1| (a) is the same as that of the direct object (NP2) in 



1 In languages similar to English that have overt case marking indirect objects would be marked with dative 
case. It has also been suggested that for English the preposition to serves as a dative case marker. 
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NPli PP2 



P2 NP 2 i 



NPn> 



VP 



to VO m 2 i NPii 

(a) (b) 
Figure 10.1: Dative shift trees: anxOVnxlPnx2 (a) and anxOVnx2nxl (b) 



Figure 10.1| (b) (and vice versa) . This use of indexing is consistent with our treatment of other 
constructions such as passive and ergative. 

Verbs that do not show this alternation and have only the NP PP structure (e.g. put) select 
the tree family TnxOVnxlpnx2. Unlike the TnxOVnxlPnx2 family, the TnxOVnxlpnx2 tree 
family does not contain trees for the NP NP structure. Other verbs such as ask allow only the 
NP NP structure as shown in (112) and (113). 

(112) Beth Ann asked Srini a question . 



(113) *Beth Ann asked a question to Srini . 

Verbs that only allow the NP NP structure select the tree family TnxOVnxlnx2. This tree 
family does not have the trees for the NP PP structure. 

Notice that in Figure 10.1| (a) the preposition to is built into the tree. There are other 
apparent cases of dative shift with for, such as in (114) and (115), that we have taken to be 
structurally distinct from the cases with to. 

(114) Beth Ann baked Dusty a biscuit . 



(115) Beth Ann baked a biscuit for Dusty 



McCawley, 1988] notes: 



A 11 for- dative" expression in underlying structure is external to the V with which it 
is combined, in view of the fact that the latter behaves as a unit with regard to all 
relevant syntactic phenomena. 

In other words, the for PP's that appear to undergo dative shift are actually adjuncts, not 
complements. Examples (116) and (117) demonstrate that PP's with for are optional while 
ditransitive to PP's are not. 



Ill 



(116) Beth Ann made dinner . 



(117) *Beth Ann gave dinner . 

Consequently, in the XTAG grammar the apparent dative shift with for PP's is treated as 
TnxOVnxlnx2 for the NP NP structure, and as a transitive plus an adjoined adjunct PP for the 
NP PP structure. To account for the ditransitive to PP's, the preposition to is built into the 
tree family TnxOVnxltonx2. This accounts for the fact that to is the only preposition allowed 
in dative shift constructions. 

[McCawley, 1988] also notes that the to and for cases differ with respect to passivization; the 



indirect objects with to may be the subjects of corresponding passives while the alleged indirect 
objects with for cannot, as in sentences (118)-(121). Note that the passivisation examples are 
for NP NP structures of verbs that take to or for PP's. 

(118) Beth Ann gave Clove dinner . 



(119) Clove was given dinner (by Beth Ann) . 



(120) Beth Ann made Clove dinner . 



(121) ?Clove was made dinner (by Beth Ann) . 

However, we believe that this to be incorrect, and that the indirect objects in the for case 
are allowed to be the subjects of passives, as in sentences (122)-(123). The apparent strangeness 
of sentence (121) is caused by interference from other interpretations of Clove was made dinner 



(122) Dania baked Doug a cake . 



(123) Doug was baked a cake by Dania 



Chapter 11 

It-clefts 



There are several varieties of it-clefts in English. All the it-clefts have four major components: 

• the dummy subject: it, 

• the main verb: be, 

• the clefted element: A constituent (XP) compatible with any gap in the clause, 

• the clause: A clause (e.g. S) with or without a gap. 

Examples of it-clefts are shown in (124)-(127). 

(124) it was [xp here xp] [s that the ENIAC was created . 5] 

(125) it was [xp at MIT xp] [s that colorless green ideas slept furiously . 5] 

(126) it is [xp happily xp] [s that Seth quit Reality . 5] 

(127) it was [xp there xp] [s that she would have to enact her renunciation . g] 

The clefted element can be of a number of categories, for example NP, PP or adverb. The 
clause can also be of several types. The English XTAG grammar currently has a separate 
analysis for only a subset of the 'specificationaF it-cleftsQ, in particular the ones without gaps 
in the clause (e.g. (126) and (127)). It-clefts that have gaps in the clause, such as (124) and 
(125) are currently handled as relative clauses. Although arguments have been made against 
treating the clefted element and the clause as a constituent ( [pelahunty, 1984fl ), the relative 
clause approach does capture the restriction that the clefted element must fill the gap in the 
clause, and does not require any additional trees. 

In the 'specificational' it-cleft without gaps in the clause, the clefted element has the role 
of an adjunct with respect to the clause. For these cases the English XTAG grammar requires 



^ee e.g. (Ball, 199 1| , jDclin, 1989) and |Dclahunty, 198-fl for more detailed discussion of types of it-clefts 
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additional trees. These it-cleft trees are in separate tree families because, although some re- 
searchers (e.g. [ Akmajian, 1970| ) derived it-clefts through movement from other sentence types 
most current researchers (e.g. fDelahunty, 1984 1, |Knowles, 1986], [Gazdar et a/., 1985], [Delin 



1989 and | Sornicola, 1988|] ) favor base-generation of the various cleft sentences. Placing the 
it-cleft trees in their own tree families is consistent with the current preference for base genera- 
tion, since in the XTAG English grammar, structures that would be related by transformation 
in a movement-based account will appear in the same tree family. Like the base-generated 
approaches, the placement of it-clefts in separate tree families makes the claim that there is no 
derivational relation between it-clefts and other sentence types. 

The three it-cleft tree families are virtually identical except for the category label of the 



clefted element. Figure |ll.l| shows the declarative tree and an inverted tree for the PP It-cleft 
tree family. 




vo s r 



NA 



NP VP 



NO VO VPi 



NP 



NO V r 



VPr 



VP! 



Vi PPil S 2 i 



e Vl PPil S2-1 



(a) (b) 
Figure 11.1: It-cleft with PP clefted element: aItVpnxls2 (a) and aInvItVpnxls2 (b) 

The extra layer of tree structure in the VP represents that, while be is a main verb rather 
than an auxiliary in these cases, it retains some auxiliary properties. The VP structure for the 
equative/it-cleft-6e is identical to that obtained after adjunction of predicative- be into small- 
clauses.0 The inverted tree in Figure ll.lK b) is necessary because of fre's auxiliary-like behavior. 
Although be is the main verb in it-clefts, it inverts like an auxiliary. Main verb inversion cannot 
be accomplished by adjunction as is done with auxiliaries and therefore must be built into the 
tree family. The tree in Figure ll.l| (b) is used for yes/no questions such as (128). 



(128) was it in the forest that the wolf talked to the little girl 



2 For additional discussion of equative or predicative- be see Chapter ^] 



Part III 

Sentence Types 
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Chapter 12 

Passives 



In passive constructions such as (129), the subject NP is interpreted as having the same role 
as the direct object NP in the corresponding active declarative (130). 

(129) An airline buy-out bill was approved by the House. (WSJ) 

(130) The House approved an airline buy-out bill. 




(a) (b) (c) 

Figure 12.1: Passive trees in the Sentential Complement with NP tree family: /3nxlVs2 (a), 
/?nxlVbynx0s2 (b) and /3nxlVs2bynx0 (c) 

In a movement analysis, the direct object is said to have moved to the subject position. The 
original declarative subject is either absent in the passive or is in a by headed PP (by phrase). 
In the English XTAG grammar, passive constructions are handled by having separate trees 
within the appropriate tree families. Passive trees are found in most tree families that have a 
direct object in the declarative tree (the light verb tree families, for instance, do not contain 
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passive trees). Passive trees occur in pairs - one tree with the by phrase, and another without 
it. Variations in the location of the by phrase are possible if a subcategorization includes other 
arguments such as a PP or an indirect object. Additional trees are required for these variations. 
For example, the Sentential Complement with NP tree family has three passive trees, shown 



in Figure 12.1 : one without the fry-phrase (Figure 12.1 (a)), one with the by phrase before 
the sentential complement (Figure 12.1(b)), and one with the by phrase after the sentential 



complement (Figure 12.1 (c)). 

Figure |12.1| (a) also shows the feature restrictions imposed on the anchor^. Only verbs with 
<mode>=ppart (i.e. verbs with passive morphology) can anchor this tree. The <mode> 
feature is also responsible for requiring that passive be adjoin into the tree to create a matrix sen- 
tence. Since a requirement is imposed that all matrix sentences must have <mode>=ind/imp, 
an auxiliary verb that selects <mode>=ppart and <passive>=-|- (such as was) must adjoin 
(see Chapter |2^ for more information on the auxiliary verb system) . 



A reduced set of features are shown for readability. 



Chapter 13 

Extraction 



The discussion in this chapter covers constructions that are analyzed as having wh-movement 
in GB, in particular, wh-questions and topicalization. Relative clauses, which could also be 



considered extractions, are discussed in Chapter 14. 

Extraction involves a constituent appearing in a linear position to the left of the clause with 
which it is interpreted. One clause argument position is empty. For example, the position filled 
by frisbee in the declarative in sentence (131) is empty in sentence (132). The wh-item what in 
sentence (132) is of the same syntactic category as frisbee in sentence (131) and fills the same 
role with respect to the subcategorization. 

(131) Clove caught a frisbee. 



(132) Whati did Clove catch e;? 

The English XTAG grammar represents the connection between the extracted element and 
the empty position with co-indexing (as does GB). The <trace> feature is used to implement 
the co-indexing. In extraction trees in XTAG, the 'empty' position is filled with an e. The 
extracted item always appears in these trees as a sister to the S r tree, with both dominated by 
a S q root node. The S r subtrees in extraction trees have the same structure as the declarative 
tree in the same tree family. The additional structure in extraction trees of the S q and NP 
nodes roughly corresponds to the CP and Spec of CP positions in GB. 

All sentential trees with extracted components (this does not include relative clause trees) 
are marked <extracted>=+ at the top S node, while sentential trees with no extracted 
components are marked <extracted>= . Items that take embedded sentences, such as nouns, 
verbs and some prepositions can place restrictions on whether the embedded sentence is allowed 
to be extracted or not. For instance, sentential subjects and sentential complements of nouns 
and prepositions are not allowed to be extracted, while certain verbs may allow extracted 
sentential complements and others may not (e.g. sentences (133)-(136)). 

(133) The jury wondered [who killed Nicole]. 



(134) The jury wondered [who Simpson killed]. 



118 



119 



(135) The jury thought [Simpson killed Nicole]. 



(136) *The jury thought [who did Simpson kill]? 

The <extracted> feature is also used to block embedded topicalization in infinitival comple- 
ment clauses as exemplified in (137). 

(137) * John wants [ Bilh [PRO to see U]} 

Verbs such as want that take non-wh infinitival complements specify that the <extracted> 
feature of their complement clause (i.e. of the foot S node) is — . Clauses that involve topical- 
ization have + as the value of their <extracted> feature (i.e. of the root S node). Sentences 
like (137) are thus ruled out. 



s,[] 

invlink : <1> 
inv : <1> 
extracted : + 
wh : <5> 




NP4. 



case : <2> [] S r [inv : <1> [ 

agr : <3> [] inv : - 

trace : <4> [ ] 
wh : <S> [] 




NP„i [] VP[] 

[] 



vo [] 

[] 



case : accj 
case : <2> 
agr : <3> 
trace : <4> 



Figure 13.1: Transitive tree with object extraction: aWlnxOVnxl 

The tree that is used to derive the embedded sentence in (135) in the English XTAG 
grammar is shown in Figure 13.1 T . The important features of extracted trees are: 

• The subtree that has S r as its root is identical to the declarative tree or a non-extracted 
passive tree, except for having one NP position in the VP filled by e. 

• The root S node is S„, which dominates NP and S r . 



1 Features not pertaining to this discussion have been taken out to improve readability. 
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• The <trace> feature of the e filled NP is co-indexed with the <trace> feature of the 
NP daughter of S g . 

• The <case> and <agr> features are passed from the empty NP to the extracted NP. This 
is particularly important for extractions from subject NP's, since <case> can continue 
to be assigned from the verb to the subject NP position, and from there be passed to the 
extracted NP. 

• The <inv> feature of S r is co-indexed to the <wh> feature of NP through the use of 
the <invlink> feature in order to force subject- auxiliary inversion where needed (see 
section |13.1| for more discussion of the <inv>/<wh> co-indexing and the use of these 
trees for topicalization). 



13.1 Topicalization and the value of the <inv> feature 

Our analysis of topicalization uses the same trees as wh-extraction. For any NP complement 
position a single tree is used for both wh-questions and for topicalization from that position. 
Wh-questions have subject- auxiliary inversion and topicalizations do not. This difference be- 
tween the constructions is captured by equating the values of the S r 's <inv> feature and the 
extracted NP's <wh> feature. This means that if the extracted item is a wh-expression, as in 
wh-questions, the value of <inv> will be + and an inverted auxiliary will be forced to adjoin. 
If the extracted item is a non-wh, <inv> will be — and no auxiliary adjunction will occur. An 
additional complication is that inversion only occurs in matrix clauses, so the values of <inv> 
and <wh> should only be equated in matrix clauses and not in embedded clauses. In the 
English XTAG grammar, appropriate equating of the <inv> and <wh> features is accom- 
plished using the <invlink> feature and a restriction imposed on the root S of a derivation. In 
particular, in extraction trees that are used for both wh-questions and topicalizations, the value 
of the <inv> feature for the top of the S r node is co-indexed to the value of the <inv> feature 
on the bottom of the S q node. On the bottom of the S 9 node the <inv> feature is co-indexed 
to the <invlink> feature. The <wh> feature of the extracted NP node is co-indexed to the 
value of the <wh> feature on the bottom of S q . The linking between the value of the S q <wh> 
and the <invlink> features is imposed by a condition on the final root node of a derivation 
(i.e. the top S node of a matrix clause) requires that <invlink>=<wh>. For example, the 



tree in Figure 13.1 is used to derive both (138) and (139). 



(138) John, I like. 



(139) Who do you like? 

For the question in (139), the extracted item who has the feature value <wh>=+, so the 
value of the <inv> feature on VP is also + and an auxiliary, in this case do, is forced to adjoin. 
For the topicalization (138) the values for Jo/in's <wh> feature and for S 9 's <inv> feature 
are both — and no auxiliary adjoins. 



13.2. EXTRACTED SUBJECTS 
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13.2 Extracted subjects 

The extracted subject trees provide for sentences like (140)-(142), depending on the tree family 
with which it is associated. 

(140) Who left? 

(141) Who wrote the paper? 

(142) Who was happy? 

Wh-questions on subjects differ from other argument extractions in not having subject- 
auxiliary inversion. This means that in subject wh-questions the linear order of the constituents 
is the same as in declaratives so it is difficult to tell whether the subject has moved out of position 



or not (see [Heycock and Kroch, 1993 1 for arguments for and against moved subject). 



The English XTAG treatment of subject extractions assumes the following: 

• Syntactic subject topicalizations don't exist; and 

• Subjects in wh-questions are extracted rather than in situ. 

The assumption that there is no syntactic subject topicalization is reasonable in English 
since there is no convincing syntactic evidence and since the interpretability of subjects as 
topics seems to be mainly affected by discourse and intonational factors rather than syntactic 
structure. As for the assumption that wh-question subjects are extracted, these questions seem 
to have more similarities to other extractions than to the two cases in English that have been 
considered in situ wh: multiple wh questions and echo questions. In multiple wh questions such 
as sentence (143), one of the wh-items is blocked from moving sentence initially because the 
first wh-item already occupies the location to which it would move. 

(143) Who ate what? 

This type of 'blocking' account is not applicable to subject wh-questions because there is 
no obvious candidate to do the blocking. Similarity between subject wh-questions and echo 



questions is also lacking. At least one account of echo questions (| Hockey, 1994 ]) argues that 
echo questions are not ordinary wh-questions at all, but rather focus constructions in which the 
wh-item is the focus. Clearly, this is not applicable to subject wh-questions. So it seems that 
treating subject wh-questions similarly to other wh-extractions is more justified than an in situ 
treatment. 

Given these assumptions, there must be separate trees in each tree family for subject extrac- 
tions. The declarative tree cannot be used even though the linear order is the same because the 
structure is different. Since topicalizations are not allowed, the <wh> feature for the extracted 
NP node is set in these trees to +. The lack of subject- auxiliary inversion is handled by the 
absence of the <invlink> feature. Without the presence of this feature, the <wh> and <inv> 
are never linked, so inversion can not occur. Like other wh-extractions, the S q node is marked 
<extracted>=-|- to constrain the occurrence of these trees in embedded sentences. The tree 



in Figure 13.2 is an example of a subject wh-question tree. 
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inv : <9> 
wh : <8> 
extracted : + 




NP-L [agr : <5> 
case : <6> 
trace : <7> 
wh : <8> + 



inv : <9> [ ] 
wh : <8> 
assign-case : <3> 
agr : <4> 



NPo case : <3> 
NA ^agr : <4> 
agr : <S>[] 
case : <6> [] 
trace : <7> [ ] 



VP 



V) 



assign-case : <3> [] 
agr : <4> [] 
agr : <1> 
assign-case : <2> 



agr : <1>[] 
assign-case : <2> [] 



Figure 13.2: Intransitive tree with subject extraction: aWOnxOV 

13.3 Wh-moved NP complement 

Wh-questions can be formed on every NP object or indirect object that appears in the declar- 
ative tree or in the passive trees, as seen in sentences (144)-(149). A tree family will contain 
one tree for each of these possible NP complement positions. Figure |13.3| shows the two extrac- 
tion trees from the ditransitive tree family for the extraction of the direct (Figure 13.3| (a)) and 
indirect object (Figure |13.3| (b)). 



(144) Dania asked Beth a question. 

(145) Whoj did Dania ask ej a question? 

(146) Whati did Dania ask Beth e»? 

(147) Beth was asked a question by Dania. 

(148) Whoj was Beth asked a question by £j?? 



(149) Whatj was Beth asked e{! by Dania? 



13.4. WH-MOVED OBJECT OF A P 
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NPoi VP 



VO NP, NP_4 
NA 




NP I VP 



VO NP,4. NP 2 



NA 



(a) 



(b) 



Figure 13.3: Ditransitive trees with direct object: aWlnxOVnxlnx2 (a) and indirect object 
extraction: aW2nxOVnxlnx2 (b) 

13.4 Wh-moved object of a P 

Wh-questions can be formed on the NP object of a complement PP as in sentence (150). 

(150) [Which dog]j did Beth Ann give a bone to ej? 

The by phrases of passives behave like complements and can undergo the same type of 
extraction, as in (151). 

(151) [Which dog], was the frisbee caught by e{? 

Tree structures for this type of sentence are very similar to those for the wh-extraction of NP 



complements discussed in section 13.3 and have the identical important features related to tree 



structure and trace and inversion features. The tree in Figure 13.4 is an example of this type 
of tree. Topicalization of NP objects of prepositions is handled the same way as topicalization 
of complement NP's. 



13.5 Wh-moved PP 

Like NP complements, PP complements can be extracted to form wh-questions, as in sentence 
(152). 

(152) [To which dog], did Beth Ann throw the frisbee e,? 



As can be seen in the tree in Figure 13.5| , extraction of PP complements is very similar to 
extraction of NP complements from the same positions. 

The PP extraction trees differ from NP extraction trees in having a PP rather than an NP 
left daughter node under S q and in having the e fill a PP rather than an NP position in the 
VP. In other respects these PP extraction structures behave like the NP extractions, including 
being used for topicalization. 



124 



CHAPTER 13. EXTRACTION 




Figure 13.4: Ditransitive with PP tree with the object of the PP extracted: aW2nxOVnxlpnx2 




Figure 13.5: Ditransitive with PP with PP extraction tree: apW2nxOVnxlpnx2 

13.6 Wh-moved S complement 

Except for the node label on the extracted position, the trees for wh-questions on S complements 
look exactly like the trees for wh-questions on NP's in the same positions. This is because there 
is no separate wh-lexical item for clauses in English, so the item what is ambiguous between 
representing a clause or an NP. To illustrate this ambiguity notice that the question in (153) 
could be answered by either a clause as in (154) or an NP as in (155). The extracted NP in 
these trees is constrained to be <wh>=+, since sentential complements can not be topicalized. 

(153) What does Clove want? 

(154) for Beth Ann to play frisbee with her 

(155) a biscuit 



13.7. WH-MOVED ADJECTIVE COMPLEMENT 
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13.7 Wh-moved Adjective complement 

In subcategorizations that select an adjective complement, that complement can be questioned 
in a wh-question, as in sentence (156). 

(156) Howj did he feel t{t 




Figure 13.6: Predicative Adjective tree with extracted adjective: aWAlnxOVaxl 

The tree families with adjective complements include trees for such adjective extractions 
that are very similar to the wh-extraction trees for other categories of complements. The 
adjective position in the VP is filled by an e and the trace feature of the adjective complement 
and of the adjective daughter of S q are co-indexed. The extracted adjective is required to be 
<wh>=+R, so no topicalizations are allowed. An example of this type of tree is shown in 
Figure 13. 6| 



2 How is the only <wh>=+ adjective currently in the XTAG English grammar. 



Chapter 14 

Relative Clauses 



Relative clauses are NP modifiers, which involve extraction of an argument or an adjunct. The 
NP head (the portion of the NP being modified by the relative clause) is not directly related to 
the extracted element. For example in (157), the person is the head NP and is modified by the 
relative clause whose mother e likes Chris. The person is not interpreted as the subject of the 
relative clause which is missing an overt subject. In other cases, such as (158), the relationship 
between the head NP export exhibitions may seem to be more direct but even there we assume 
that there are two independent relationships: one between the entire relative clause and the NP 
it modifies, and another between the extracted element and its trace. The extracted element 
may be an overt w/i-phrase as in (157) or a covert element as in (158). 

(157) the person whose mother likes Chris 



(158) export exhibitions that included high-tech items 

Relative clauses are represented in the English XTAG grammar by auxiliary trees that adjoin 
to NP's. These trees are anchored by the verb in the clause and appear in the appropriate tree 
families for the various verb subcategorizations. Within a tree family there will be groups of 
relative clause trees based on the declarative tree and each passive tree. Within each of these 
groups, there is a separate relative clause tree corresponding to each possible argument that can 
be extracted from the clause. There is no relationship between the extracted position and the 
head NP. The relationship between the relative clause and the head NP is treated as a semantic 
relationship which will be provided by any reasonable compositional theory. The relationship 
between the extracted element (which can be covert) is captured by co-indexing the <trace> 
features of the extracted NP and the NP^, node in the relative clause tree. If for example, it is 
NPo that is extracted, we have the following feature equations: 
NP w .t:( trace ) =NP .t:( trace ) 
NP w .t:( case ) =NP .t:( case ) 
NP tt .t:< agr ) =NP .t:( agr ) Q 

1 No adjunct traces are represented in the XTAG analysis of adjunct extraction. Relative clauses on adjuncts 
do not have traces and consequently feature equations of the kind shown here are not present. 
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Representative examples from the transitive tree family are shown with a relevant subset 
of their features in Figures 14.1| (a) and |14.l] (b). Figure 14.1 (a) involves a relative clause with a 
covert extracted element, while figure 14. l| (b) involves a relative clause with an overt w/i-phrase.0 




rel-pron : <4> [ ] 
select-mode : <S> inf/ind 
assign-case : <6> [ ] 




* case : <10>[ ] 
agr : <11> [ ] 
wh : <12> [ ] 
case : nom/accl 





NR- i select-mode : <1> ind 
wh : + 
agr : <2> [ ] 
case : <3> [ ] 
trace : <4> [ ] 




Nft Tease : <8> [ ] 
m |_agr : <9> [ ] 
agr : <2> 
case : <3> 
trace : <4> 




NI) I 



(a) (b) 

Figure 14.1: Relative clause trees in the transitive tree family: /3NclnxOVnxl (a) and 
/3N0nx0Vnxl (b) 

The above analysis is essentially identical to the GB analysis of relative clauses. One aspect 
of its implementation is that an covert + <wh> NP and a covert Comp have to be introduced. 
See (159) and (160) for example. 

(159) export exhibitions [ \NP w t]i [ that [ e» included high-tech items]]] 



(160) the export exhibition [ [wp^e], [ ec [Muriel planned e,]]] 

The lexicalized nature of XTAG makes it problematic to have trees headed by null strings. 
Of the two null trees, NP W and Comp, that we could postulate, the former is definitely more 
undesirable because it would lead to massive overgeneration, as can be seen in (161) and (162). 

(161) * [jvp^e] did John eat the apple? (as a w/i-question) 



2 The convention followed in naming relative clause trees is outlined in Appendix 
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(162) * I wonder [[jvp^e] Mary likes John](as an indirect question) 

The presence of an initial headed by a null Comp does not lead to problems of overgeneration 
because relative clauses are the only environment with a Comp substitution node. f\ 

Consequently, our treatment of relative clauses has different trees to handle relative clauses 
with an overt extracted w/j-NP and relative clauses with a covert extracted ro/i-NP. Relative 
clauses with an overt extracted wh-NP involve substitution of a +<wh> NP into the NP„, 
node and have a Comp node headed by ec built in. Relative clauses with a covert extracted 
wh-NP have a NP^ node headed by e w built in and involve substitution into the Comp node. 
The Comp node that is introduced by substitution can be the ec (null complementizer), that, 
and for. 



For example, the tree shown in Figure 14.1(b) is used for the relative clauses shown in 



sentences (163)-(164), while the tree shown in Figure [143(a) is used for the relative clauses in 
sentences (165)-(168). 

(163) the man who Muriel likes 



(164) the man whose mother Muriel likes 



(165) the man Muriel likes 



(166) the book for Muriel to read 



(167) the man that Muriel likes 



(168) the book Muriel is reading 



Cases of PP pied-piping (cf. 169) are handled in a similar fashion by building in a PP 
node. 



(169) the demon by whom Muriel was chased 



See the tree in Figure |14.2 



8.4 



3 Complementizers in clausal complementation are introduced by adjunction. See section 
4 The feature equation used is NP„.t:<wh>= +. Examples of NPs that could substitute under NP W ai 
whose mother, who, whom, and also which but not when and where which are treated as exhaustive +wh PPs. 
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Figure 14.2: Adjunct relative clause tree with PP-pied-piping in the transitive tree family: 
/3NpxnxOVnxl 

14.1 Complementizers and clauses 

The co-occurrence constraints that exist between various Comps and the clause type of the 
clause they occur with are implemented through combinations of different clause types using 
the <mode> feature, the <select-mode> feature, and the <rel-pron> feature. 

Clauses are specified for the <mode> feature which indicates the clause type of that clause. 
Possible values for the <mode> feature are ind, inf, ppart, ger etc. 

Comps are lexically specified for a feature named <select-mode>. In addition, the 
<select-mode> feature of the Comp is equated with the <mode> feature of its comple- 
ment S by the following equation: 
S r .t:(mode) = Comp.t: (select-mode) 

The lexical specifications of the Comps are shown below: 

• ec, Comp.t: (select-mode) =ind/inf/ger/ppart 

• that, Comp.t:(select-mode) =ind 
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• for, Comp.t: (select-mode) =inf 

The following examples display the co-occurence constraints which the <select-mode> 
specifications assigned above implement. 
For ec- 

(170) the book Muriel likes (S.t:<mode>= ind) 

(171) a book to like (S.t:<mode>= inf) 

(172) the girl reading the book (S.t:<mode>= ger) 

(173) the book read by Muriel (S.t:<mode>= ppart) 
For for. 

(174) *the book for Muriel likes (S.t:<mode>= ind) 

(175) a book for Mary to like (S.t:<mode>= inf) 

(176) *the girl for reading the book (S.t:<mode>= ger) 

(177) *the book for read by Muriel (S.t:<mode>= ppart) 
For that: 

(178) the book that Muriel likes (S.t:<mode>= ind) 

(179) *a book that (Muriel) to like (S.t:<mode>= inf) 

(180) *the girl that reading the book (S.t:<mode>= ger) 

(181) *the book that read by Muriel (S.t:<mode>= ppart) 

Relative clause trees that have substitution of NP m have the following feature equations: 
S r .t:(mode) = NP w .t:(select-mode) 
NP w .t:(select-mode) =ind 

The examples that follow are intended to provide the rationale for the above setting of 
features. 

(182) the boy whose mother chased the cat (S r .t:(mode) =ind) 



14.1. COMPLEMENTIZERS AND CLA USES 



131 



(183) *the boy whose mother to chase the cat (S r .t:(mode) =inf) 

(184) *the boy whose mother eaten the cake (S r .t:(mode) =ppart) 

(185) *the boy whose mother chasing the cat (S r .t:(mode) = ger) 

(186) the boy [whose motherjj Bill believes ei to chase the cat 
(S r .t: (mode) =ind) 

The feature equations that appear in trees which have substitution of PP W are: 
S r .t:(mode) = PP^.t^select-mode) 
PP ra .t:(mode) =ind/inf 

Examples that justify the above feature setting follow. 

(187) the person [by whom] this machine was invented (S r .t:(mode) =ind) 

(188) a baker [in whom]; PRO to trust (S r .t:(mode) = inf) 

(189) *the fork [with which] (Geoffrey) eaten the pudding (S r .t:( mode) =ppart) 

(190) *the person [by whom] (this machine) inventing (S r .t:(mode ) =ger) 
14.1.1 Further constraints on the null Comp ec 

There are additional constraints on where the null Comp ec can occur. The null Comp is not 
permitted in cases of subject extraction unless there is an intervening clause or or the relative 
clause is a reduced relative (mode = ppart/ger). This can be seen in (191-194). 

(191) *the toy [a [e c [ e { likes Dafna]]] 

(192) the toy [e, [ec Fred thinks [ e« likes Dafna]]] 

(193) the boy [q [ec [ £i eating the guava]]] 

(194) the guava [e^ [ec [ £j eaten by the boy]]] 

5 As is the case for NP„ substitution, any +wh-PP can substitute under PP m . This is implemented by the 
following equation: 
PP ro .t:(wh) = + 

Not all cases of pied-piping involve substitution of PP l0 . In some cases, the P may be built in. In cases where 
part of the pied-piped PP is part of the anchor, it continues to function as an anchor even after pied-piping i.e. 
the P node and the NP U , nodes are represented separately. 
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To model this paradigm, the feature (rel-pron) is used in conjunction with the following 
equations: 

• S r .t: (rel-pron) = Comp.t: (rel-pron) 

• S r .b: (rel-pron) = S r .b:(mode) 

• Comp.b: (rel-pron) =ppart/ger/adj-clause (for ec) 

The full set of the equations shown above is only present in Comp substitution trees involving 
subject extraction. So (195) will not be ruled out. 

(195) the toy [e$ [ec [ Dafna likes o L ]]] 

The feature mismatch induced by the above equations is not remedied by adjunction of just 
any S-adjunct because all other S-adjuncts are transparent to the (rel-pron) feature because 
of the following equation: 
S m .b: (rel-pron) = S/.t: (rel-pron) 

14.2 Reduced Relatives 

Reduced relatives are permitted only in cases of subject-extraction. Past participial reduced 
relatives are only permitted on passive clauses. See (196-203). 

(196) the toy \e\ [ec [ playing the banjo]]] 

(197) *the instrument [e^ [ec [ Amis playing e-i ]]] 

(198) *the day [e w [ec [ Amis playing the banjo]]] 

(199) the apple [ej [ec [ £j eaten by Dafna]]] 

(200) *the child [ej [ec [ the apple eaten by e« ]] 

(201) *the day [e w [ec [ Amis eaten the apple]]] 

(202) *the apple [ej [ec [ Dafna eaten q ]]] 

(203) *the child [e^ [ec [ e« eaten the apple ]]) 

These restrictions are built into the <mode> specifications of S.t. So non-passive cases of 
subject extraction have the following feature equation: 
S r .t:(mode) = ind/ger/inf 

Passive cases of subject extraction have the following feature equation: 
S r .t:(mode) = ind/ger/ppart/inf 

Finally, all cases of non-subject extraction have the following feature equation: 
S r .t:(mode) = ind/inf 
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14.2.1 Restrictive vs. Non-restrictive relatives 

The English XTAG grammar does not contain any syntactic distinction between restrictive and 
non-restrictive relatives because we believe this to be a semantic and/or pragmatic difference. 

14.3 External syntax 

A relative clause can combine with the NP it modifies in at least the following two ways: 

(204) [the [toy [a [e c [Dafna likes e { ]}]}} 

(205) [[the toy] [a [e c [Dafna likes e t ]]]] 

Based on cases like (206) and (207), which are problematic for the structure in ( |204| ) , the 
structure in ( f205| ) is adopted. 

(206) [[the man and the woman] [who met on the bus]] 



(207) [[the man and the woman] [who like each other]] 



As it stands, the RC analysis sketched so far will combine in two ways with the Determiner 
tree shown in Figure (p^) , [] giving us both the possiblities shown in (|204| ) and ( |205| ) . In order 
to block the structure exemplified in (204), the feature (rel-clause) is used in combination with 
the following equations. 

On the RC: 
NP r .b: (rel-clause) = + 

On the Determiner tree: 
NP/.t: (rel-clause) = - 

Together, these equations block introduction of the determiner above the relative clause. 



14.4 Other Issues 

14.4.1 Interaction with adjoined Comps 

The XTAG analysis now has two different ways of introducing a complementizer like that or for, 
depending upon whether it occurs in a relative clause or in sentential complementation. Relative 
clause complementizers substitute in (using the tree aComp), while sentential complementizers 
adjoin in (using the tree /3COMPs). Cases like (208) where both kinds of complementizers 
illicitly occur together are blocked. 

(208) *the book [e Wi [that [that [Muriel wrote e»]]]] 

6 The determiner tree shown has the <rel-clause> feature built in. The RC analysis would give two parses 
in the absence of this feature. 
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displ-const : <1> [ ] 
NA conj : <2> [] 

case : <3> nom/acc 
agr : <4> [] 

rel-clause : - 

gerund : - 
wh : - 
quan : - 
gen : - 
definite : - 
decreas : - 
const : - 
card : - 



the 



Figure 14.3: Determiner tree with <rel-clause> feature: /3Dnx 



This is accomplished by setting the S r .t:<comp> feature in the relative clause tree to nil. 
The S r .t:<comp> feature of the auxiliary tree that introduces (the sentential complementa- 
tion) that is set to that. This leads to a feature clash ruling out (208). On the other hand, if a 
sentential complement taking verb is adjoined in at S r , this feature clash goes away (cf. 209). 

(209) the book [e Wi [that Beth thinks [that [Muriel wrote e;]]]] 

14.4.2 Adjunction on PRO 

Adjunction on PRO, which would yield the ungrammatical (210) is blocked. 

(210) *I want [[PRO [who Muriel likes] to read a book]]. 

This is done by specifying the <case> feature of NPj to be nom/acc. The <case> feature 
of PRO is null. This leads to a feature clash and blocks adjunction of relative clauses on to 
PRO. 

14.4.3 Adjunct relative clauses 

Two types of trees to handle adjunct relative clauses exist in the XTAG grammar: one in which 
there is PP W substitution with a null Comp built in and one in which there is a null NP m 
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built in and a Comp substitutes in. There is no NP^, substitution tree with a null Comp 
built in. This is because of the contrast between (211) and (212). 

(211) the day [[on whose predecessor] [ec [Muriel left]]] 

(212) *the day [[whose predecessor] [ec [Muriel left]]] 

In general, adjunct relatives are not possible with an overt NP m . We do not consider (213) 
and (214) to be counterexamples to the above statements because we consider where and when 
to be exhaustive PPs that head a PP initial tree. 

(213) the place [where [ec [Muriel wrote her first book]]] 

(214) the time [when [ec [Muriel lived in Bryn Mawr]]] 
14.4.4 ECM 

Cases where for assigns exceptional case (cf. 215, 216) are handled. 

(215) a book [e Wi [for [Muriel to read e$]]] 

(216) the time [e Wi [for [Muriel to leave Haverford]]] 

The assignment of case by for is implemented by a combination of the following equations. 
Comp.t:(assign-case) =acc 
S r .t: (assign-case) =Comp.t: (assign-case) 
S r .b:(assign-case) =NPo-t:(case) 

14.5 Cases not handled 

14.5.1 Partial treatment of free-relatives 

Free relatives are only partially handled. All free relatives on non-subject positions and some 
free relatives on subject positions are handled. The structure assigned to free relatives treats the 
extracted wh-NP as the head NP of the relative clause. The remaining relative clause modifies 
this extracted wh-NP (cf. 217-219). 

(217) what(ever) [e Wi [ec [Mary likes e^]]] 

(218) where(ever) [e w [ec [Mary lives]]] 

(219) who(ever) [e Wi [e c [Muriel thinks [e* likes Mary]]]] 

However, simple subject extractions without further emebedding are not handled (cf. 220). 

(220) who (ever) [e m [e c [e; likes Bill]]] 

This is because (219) is treated exactly like the ungrammatical (221). 

(221) *the person [ e Wi [e c [e* likes Bill]]] 
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14.5.2 Adjunct P-stranding 

The following cases of adjunct preposition stranding are not handled (cf. 222, 223). 

(222) the pen Muriel wrote this letter with 

(223) the street Muriel lives on 

Adjuncts are not built into elementary trees in XTAG. So there is no clean way to represent 
adjunct preposition stranding. A better solution is, probably , available if we make use of 
multi-component adjunction. 

14.5.3 Overgeneration 

The following ungrammatical sentences are currently being accepted by the XTAG grammar. 
This is because no clean and conceptually attractive way of ruling them out is obvious to us. 

14.5.3.1 how as wh-NP 

In standard American English, how is not acceptable as a relative pronoun (cf. 224). 

(224) *the way [how [e c [PRO to solve this problem]]] 

However, (224) is accepted by the current grammar. The only way to rule (224) out would 
be to introduce a special feature devoted to this purpose. This is unappealing. Further, there 
exist speech registers/dialects of English, where (224) is acceptable. 

14.5.3.2 /or-trace effects 

(225) is ungrammatical, being an instance of a violation of the /or-trace filter of early transfor- 
mational grammar. 

(225) the person [e Wi [for [ej to read the book]]] 
The XTAG grammar currently accepts (225). [] 

14.5.3.3 Internal head constraint 

Relative clauses in English (and in an overwhelming number of languages) obey a 'no internal 
head' constraint. This constraint is exemplified in the contrast between (226) and (227). 

(226) the person [whoj [ec Muriel likes ei\] 

(227) *the person [[which person] i [ec Muriel likes e»]] 

We know of no good way to rule (227) out, while still ruling (228) in. 
7 It may be of some interest that (225) is acceptable in certain dialects of Belfast English. 
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(228) the person [[whose mother]j [cc Muriel likes e*]] 

Dayal (1996) suggests that 'full' NPs such as which person and whose mother are R- 
expressions while who and whose are pronouns. R-expressions, unlike pronouns, are subject 
to Condition C. (226) is, then, ruled out as a violation of Condition C since the person and 
which person are co-indexed and the person c-commands which person. If we accept Dayal's ar- 
gument, we have a principled reason for allowing overgeneration of relative clauses that violate 
the internal head constraint, the reason being that the XTAG grammar does generate binding 
theory violations. 

14.5.3.4 Overt Comp constraint on stacked relatives 

Stacked relatives of the kind in (229) are handled. 

(229) [[the book [that Bill likes]] [which Mary wrote]] 

There is a constraint on stacked relatives: all but the relative clause closest to the head-NP 
must have either an overt Comp or an overt NP^. Thus (230) is ungrammatical. 

(230) *[[the book [that Bill likes]] [Mary wrote]] 



Again, no good way of handling this constraint is known to us currently. 



Chapter 15 

Adjunct Clauses 



Adjunct clauses include subordinate clauses (i.e. those with overt subordinating conjunctions), 
purpose clauses and participial adjuncts. 

Subordinating conjunctions each select four trees, allowing them to appear in four different 
positions relative to the matrix clause. The positions are (1) before the matrix clause, (2) after 
the matrix clause, (3) before the VP, surrounded by two punctuation marks, and (4) after the 



matrix clause, separated by a punctuation mark. Each of these trees is shown in Figure 15.1 






VP,* PP Punctyi PP Punct 2 i VP* 



P Si 






NA Sf Puncti PP 



NA 



P SI 




P; P, 



P Si 



because in order as if when 

(1) PPss (2) /3vxPNs (3) /3puPPspuvx (4) /3spuPs 

Figure 15.1: Auxiliary Trees for Subordinating Conjunctions 

Sentence-initial adjuncts adjoin at the root S of the matrix clause, while sentence-final 
adjuncts adjoin at a VP node. In this, the XTAG analysis follows the findings on the attach- 
ment sites of adjunct clauses for conditional clauses (| Iatridou, 1991|| ) and for infinitival clauses 
( |Browning, 1987 ]). One compelling argument is based on Binding Condition C effects. As 
can be seen from examples (231)-(233) below, no Binding Condition violation occurs when the 
adjunct is sentence initial, but the subject of the matrix clause clearly governs the adjunct 
clause when it is in sentence final position and co-indexation of the pronoun with the subject 
of the adjunct clause is impossible. 



(231) Unless she, hurries, Mary^ will be late for the meeting. 
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(232) *Shej will be late for the meeting unless Mary, hurries. 

(233) Maryj will be late for the meeting unless she« hurries. 

We had previously treated subordinating conjunctions as a subclass of conjunction, but 
are now assigning them the POS preposition, as there is such clear overlap between words 
that function as prepositions (taking NP complements) and subordinating conjunctions (taking 
clausal complements). While there are some prepositions which only take NP complements and 
some which only take clausal complements, many take both as shown in examples (234)- (237), 
and it seems to be artificial to assign them two different parts-of-speech. 

(234) Helen left before the party. 



(235) Helen left before the party began. 

(236) Since the election, Bill has been elated. 

(237) Since winning the election, Bill has been elated. 

Each subordinating conjunction selects the values of the <mode> and <comp> features 
of the subordinated S. The <mode> value constrains the types of clauses the subordinating 
conjunction may appear with and the <comp> value constrains the complementizers which 
may adjoin to that clause. For instance, indicative subordinate clauses may appear with the 
complementizer that as in (238), while participial clauses may not have any complementizers 
(239). 

(238) Midge left that car so that Sam could drive to work. 

(239) *Since that seeing the new VW, Midge could think of nothing else. 



15.0.4 Multi-word Subordinating Conjunctions 

We extracted a list of multi-word conjunctions, such as as if, in order, and for all (that), from 
IQuirk et al, 19~85|1 . For the most part, the components of the complex are all anchors, as 
shown in Figures [153(a). In one case, as ADV as, there is a great deal of latitude in the choice 
of adverb, so this is a substitution site (Figures 15.2| (b)). This multi- anchor treatment is very 
similar to that proposed for idioms in [ [Abeille and Schabes, 198S ], and the analysis of light 
verbs in the XTAG grammar (see section~ |6.15 ). 



15.1 "Bare" Adjunct Clauses 

"Bare" adjunct clauses do not have an overt subordinating conjunction, but are typically parallel 
in meaning to clauses with subordinating conjunctions. For this reason, we have elected to 
handle them using the same trees shown above, but with null anchors. They are selected at the 
same time and in the same way the PRO tree is, as they all have PRO subjects. Three values 
of <mode> are licensed: inf (infinitive), ger (gerundive) and ppart (past participal) .Q They 

1 We considered allowing bare indicative clauses, such as He died that others may live, but these were considered 
too archaic to be worth the additional ambiguity they would add to the grammar. 
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as 



soon 



as 



as 



as 



(a) 



(b) 



Figure 15.2: Trees Anchored by Subordinating Conjunctions: /3vxPARBPs and /3vxParbPs 
interact with complementizers as follows: 



(240) [Destroyed by the fire], the building still stood. 



(241) The fire raged for days [destroying the building]. 



(242) *[That destroyed by the fire], the building still stood. 

• Infinitival adjuncts, including purpose clauses, are licensed both with and without the 
complementizer for. 

(243) Harriet bought a Mustang [to impress Eugene]. 

(244) [To impress Harriet], Eugene dyed his hair. 



(245) Traffic stopped [for Harriet to cross the street]. 



• Participial complements do not license any complementizers :0 



2 While these sound a bit like extraposed relative clauses (see flKroch and Joshi, 1987 1 ) , those move only to 
the right and adjoin to S; as these clauses are equally grammatical both sentence-initially and sentence-finally, 
we are analyzing them as adjunct clauses. 
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destroyed P NP D building still V the 



(a) (b) 
Figure 15.3: Sample Participial Adjuncts 



15.2 Discourse Conjunction 

The CONJs auxiliary tree is used to handle 'discourse' conjunction, as in sentence (246). Only 
the coordinating conjunctions (and, or and but) are allowed to adjoin to the roots of matrix 



sentences. Discourse conjunction with and is shown in the derived tree in Figure 15.4 



(246) And Truffula trees are what everyone needs! [ geuss, 1971 ] 
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Chapter 16 

Imperatives 



Imperatives in English do not require overt subjects. The subject in imperatives is second 
person, i.e. you, whether it is overt or not, as is clear from the verbal agreement and the 
interpretation. Imperatives with overt subjects can be parsed using the trees already needed 
for declaratives. The imperative cases in which the subject is not overt are handled by the 
imperative trees discussed in this section. 

The imperative trees in English XTAG grammar are identical to the declarative tree except 
that the NPo subject position is filled by an e, the NPo <agr pers> feature is set in the tree 
to the value 2nd and the <mode> feature on the root node has the value imp. The value 
for <agr pers> is hardwired into the epsilon node and insures the proper verbal agreement 
for an imperative. The <mode> value of imp on the root node is recognized as a valid mode 
for a matrix clause. The imp value for <mode> also allows imperatives to be blocked from 
appearing as embedded clauses. Figure 16.1 is the imperative tree for the transitive tree family. 
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progressive : <11> 

perfecl : <12> 
passive : <13> 

: <14> 

assigii-eump : <1S> 

assign-ease : <9> 

W :<1«H 
tense : <lfi> 
mode : <17> 
Inr : 

displ-eonst : [set] : <18> 

tump : nil 

exlraeted 




<*> [] 

I) 



v •<*> I I 
passive : <7> 
mode : baTij 



Figure 16.1: Transitive imperative tree: alnxOVnxl 



Chapter 17 

Gerund NP's 



There are two types of gerunds identified in the linguistics literature. One is the class of 
derived nominalizations (also called nominal gerundives or action nominalizations) exemplified 
in (247), which instantiates the direct object within an of PP. The other is the class of so- 
called sentential or VP gerundives exemplified in (248). In the English XTAG grammar, the 
derived nominalizations are termed determiner gerunds, and the sentential or VP gerunds 
are termed NP gerunds. 

(247) Some think that the selling of bonds is beneficial. 



(248) Are private markets approving of Washington bashing Wall Street? 

Both types of gerunds exhibit a similar distribution, appearing in most places where NP's 
are allowed.^] The bold face portions of sentences (249)— (251) show examples of gerunds as a 
subject and as the object of a preposition. 

(249) Avoiding such losses will take a monumental effort. 



(250) Mr. Nolen's wandering doesn't make him a weirdo. 



(251) Are private markets approving of Washington bashing Wall Street? 

The motivation for splitting the gerunds into two classes is semantic as well as structural 
in nature. Semantically, the two gerunds are in sharp contrast with each other. NP gerunds 
refer to an action, i.e., a way of doing something, whereas determiner gerunds refer to a fact. 



Structurally, there are a number of properties (extensively discussed in | Lees, 196C |) that show 
that NP gerunds have the syntax of verbs, whereas determiner gerunds have the syntax of basic 
nouns. Firstly, the fact that the direct object of the determiner gerund can only appear within 
an of PP suggests that the determiner gerund, like nouns, is not a case assigner and needs 
insertion of the preposition of for assignment of case to the direct object. NP gerunds, like 



x an exception being the NP positions in "equative BE" sentences, such as, John is my father. 
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verbs, need no such insertion and can assign case to their direct object. Secondly, like nouns, 
only determiner gerunds can appear with articles (cf. example (252) and (253)). Thirdly, 
determiner gerunds, like nouns, can be modified by adjectives (cf. example (254)), whereas 
NP gerunds, like verbs, resist such modification (cf. example (255)). Fourthly, nouns, unlike 
verbs, cannot co-occur with aspect (cf. example (256) and (257)). Finally, only NP gerunds, 
like verbs, can take adverbial modification (cf. example (258) and (259)). 



(252) . . . the proving of the theorem. . 



(det ger with article) 



(253) 



, the proving the theorem. 



(NP ger with article) 



(254) John's rapid writing of the book. 



(det ger with Adj) 



(255) * John's rapid writing the book. . . . 



(NP ger with Adj) 



(256) * John's having written of the book. . . . 



(det ger with aspect) 



(257) John having written the book. . . . 



(NP ger with aspect) 



(258) * His writing of the book rapidly. 



(det ger with Adverb) 



(259) His writing the book rapidly. 



(NP ger with Adverb) 



In English XTAG, the two types of gerunds are assigned separate trees within each tree 
family, but in order to capture their similar distributional behavior, both are assigned NP as 
the category label of their top node. The feature gerund = +/— distinguishes gerund NP's 
from regular NP's where needed. [] The determiner gerund and the NP gerund trees are discussed 
in section ( 17.1 ) and ( |17.2|) respectively. 



17.1 Determiner Gerunds 



The determiner gerund tree in Figure 17.1 is anchored by a V, capturing the fact that the 



gerund is derived from a verb. The verb projects an N and instantiates the direct object as an 
of PP. The nominal category projected by the verb can now display all the syntactic properties 
of basic nouns, as discussed above. For example, it can be straightforwardly modified by 
adjectives; it cannot co-occur with aspect; and it can appear with articles. The only difference 
of the determiner gerund nominal with the basic nominals is that the former cannot occur 
without the determiner, whereas the latter can. The determiner gerund tree therefore has an 
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NP[] 

agr : 3rdsing : + 
pers : 3 
num : sing 

case : nom/acc 

wh : <3> 

decreas : <4> 

gen : <5> 

card : <6> 

quan : <7> 

definite : <8> 

const : <9> 




aDnxOVnxl 

Figure 17.1: Determiner Gerund tree from the transitive tree family: aDnxOVnxl 



initial D modifying the N.^] It is used for gerunds such as the ones in bold face in sentences 
(260), (261) and (262). 

The D node can take a simple determiner (cf. example (260)), a genitive pronoun (cf. 

2 This feature is also needed to restrict the selection of gerunds in NP positions. For example, the subject 
and object NP's in the "equative BE" tree (TnxOBEnxl) cannot be filled by gerunds, and are therefore assigned 
the feature gerund = — , which prevents gerunds (which have the feature gerund = +) from substituting into 
these NP positions. 

3 Note that the determiner can adjoin to the gerund only from within the gerund tree. Adjunction of deter- 
miners to the gerund root node is prevented by constraining determiners to only select NP's with the feature 
gerund = — . This rules out sentences like Private markets approved of (*the) [the selling of bonds]. 
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example (261)), or a genitive NP (cf. example (262)).[j 

(260) Some think that the selling of bonds is beneficial. 

(261) His painting of Mona Lisa is highly acclaimed. 

(262) Are private markets approving of Washington's bashing of Wall Street? 

17.2 NP Gerunds 

NP gerunds show a number of structural peculiarities, the crucial one being that they have the 
internal properties of sentences. In the English XTAG grammar, we adopt a position similar 



to that of | Rosenbaum, 1967 ] and [ Emonds, 1970 ] - that these gerunds are NP's exhaustively 



dominating a clause. Consequently, the tree assigned to the transitive NP gerund tree (cf. 



Figure 17.2 ) looks exactly like the declarative transitive tree for clauses except for the root 
node label and the feature values. The anchoring verb projects a VP. Auxiliary adjunction is 
allowed, subject to one constraint - that the highest verb in the verbal sequence be in gerundive 
form, regardless of whether it is a main or auxiliary verb. This constraint is implemented by 
requiring the topmost VP node to be <mode> = ger. In the absence of any adjunction, the 
anchoring verb itself is forced to be gerundive. But if the verbal sequence has more than one 
verb, then the sequence and form of the verbs is limited by the restrictions that each verb in the 
sequence imposes on the succeeding verb. The nature of these restrictions for sentential clauses, 
and the manner in which they are implemented in XTAG, are both discussed in Chapter |2(J 
The analysis and implementation discussed there differs from that required for gerunds only in 
one respect - that the highest verb in the verbal sequence is required to be <mode> = ger. 

Additionally, the subject in the NP gerund tree is required to have <case>=acc/none/gen, 
i.e., it can be either a PRO (cf. example 263), a genitive NP (cf. example 264), or an accusative 
NP (cf. example 265). The whole NP formed by the gerund can occur in either nominative or 
accusative positions. 

(263) . . . John does not like wearing a hat. 



(264) Are private markets approving of Washington's bashing Wall Street? 

(265) Mother disapproved of me wearing such casual clothes. 

One question that arises with respect to gerunds is whether there is anything special about 
their distribution as compared to other types of NP's. In fact, it appears that gerund NP's 
can occur in any NP position. Some verbs might not seem to be very accepting of gerund NP 
arguments, as in (266) below, but we believe this to be a semantic incompatibility rather than 
a syntactic problem since the same structures are possible with other lexical items. 

4 The trees for genitive pronouns and genitive NP's have the root node labelled as D because they seem to 
function as determiners and as such, sequence with the rest of the determiners. See Chapter [f8|for discussion on 
determiner trees. 
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NP r 



gerund : + 

displ-const : [setl : <3>] 
agr : 3rdsing : + 

pers : 3 

num : sing 
case : nom/acc 
vv h : <4> [ ] 




wh : <4> 
case : acc/none/gen 



mode : ger 

displ-const : [ se tl : <3> 

passive : <1> 
mode : <2> 
compar : - 
displ-const : [ se tl : -] 



VO passive : <1> - 
mode : <2> [ ] 



NP,J, 



]i case : acc 



aGnxOVnxl 

Figure 17.2: NP Gerund tree from the transitive tree family: aGnxOVnxl 



(266) ? [np John's tinkering^] ran. 



(267) [jyp John's tinkeringjvp] worked. 

By having the root node of gerund trees be NP, the gerunds have the same distribution as 
any other NP in the English XTAG grammar without doing anything exceptional. The clause 
structure is captured by the form of the trees and by inclusion in the tree families. 



17.3 Gerund Passives 

It was mentioned above that the NP gerunds display certain clausal properties. It is therefore 
not surprising that they too have their own set of transformationally related structures. For 
example, NP gerunds allow passivization just like their sentential counterparts (cf. examples 
(268) and (269)). 

(268) The lawyers objected to the slanderous book being written by John. 
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(269) Susan could not forget having been embarrassed by the vicar. 



In the English XTAG grammar, gerund passives are treated in an almost exactly similar 
fashion to sentential passives, and are assigned separate trees within the appropriate tree fam- 
ilies. The passives occur in pairs, one with the by phrase, and another without it. There are 
two feature restrictions imposed on the passive trees: (a) only verbs with <mode> = ppart 
(i.e., verbs with passive morphology) can be the anchors, and (b) the highest verb in the verb 
sequence is required to be <mode> = ger. The two restrictions, together, ensure the selection 
of only those sequences of auxiliary verb(s) that select <mode> = ppart and <passive> = 
+ (such as being or having been but NOT having). The passive trees are assumed to be related 
to only the NP gerund trees (and not the determiner gerund trees), since passive structures 
involve movement of some object to the subject position (in a movement analysis). Also, like 
the sentential passives, gerund passives are found in most tree families that have a direct object 



in the declarative tree. Figure 17.3 shows the gerund passive trees for the transitive tree family. 



[] 

gerund : + 
wh : <6> 

displ-const : [ S etl : <5>] 
agr : [3rdsing : + 
pers : 3 
num : sing 
case : nom/acc 



NP,.[] 

gerund : + 
wh : <4> 

displ-const : [setl : <3>1 



NP ; -L W h : <6> [] 

[case : acc/none/gen] 




agr : 



3rdsing : + 
pers : 3 
num : sing 
nom/acc 



mode : ger 
displ-const : [ se tl : <5> - 
compar : - 

displ-const : [setl : -] 
passive : <3> 
mode : <4> 




NP/4- case : acc/none/gen 
Iwh : <4> [] 



V0 trans : + 

passive : <3> + 
mode : <4> ppart 

[] 




[assign-case : <2>J NPgl [wh : <1> [ ] 
[assign-case : ace] [case : <2> [ J 



mode : ger 

displ-const : [ se tl : <3> - 
passive : <1> 
mode : <2> 

displ-const : [ se tl : -] 
compar : - 



Vo 



trans : + 
passive : <1> + 
mode : <2> ppart 

f] 



(a) aGnxlVbynxO 



(b) aGnxlV 



Figure 17.3: Passive Gerund trees from the transitive tree family: aGnxlVbynxO (a) and 
aGnxlV (b) 



Part IV 

Other Constructions 
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Chapter 18 

Determiners and Noun Phrases 



In our English XTAG grammar |] all nouns select the noun phrase (NP) tree structure shown 
in Figure 18.1 . Common nouns do not require determiners in order to form grammatical 
NPs. Rather than being ungrammatical, singular countable nouns without determiners are 
restricted in interpretation and can only be interpreted as mass nouns. Allowing all nouns to 
head determinerless NPs correctly treats the individuation in countable NPs as a property of 
determiners. Common nouns have negative("-") values for determiner features in the lexicon in 
our analysis and can only acquire a positive("+") value for those features if determiners adjoin 
to them. Other types of NPs such as pronouns and proper nouns have been argued by Abney 



[Abney, 1987 1 to either be determiners or to move to the determiner position because they 
exhibit determiner-like behavior. We can capture this insight in our system by giving pronouns 
and proper nouns positive values for determiner features. For example pronouns and proper 
nouns would be marked as definite, a value that NPs containing common nouns can only obtain 
by having a definite determiner adjoin. In addition to the determiner features, nouns also have 
values for features such as reflexive (refl), case, pronoun (pron) and conjunction (conj). 

A single tree structure is selected by simple determiners, an auxiliary tree which adjoins 
to NP. An example of this determiner tree anchored by the determiner these is shown in 
Figure 18. 2| . In addition to the determiner features the tree in Figure 18. 2| has noun features 
such as case (see section 4.4.2), the conj feature to control conjunction (see Chapter 21), 
rel-clause— (see Chapter |l4j) and gerund— (see Chapter [17]) which prevent determiners from 
adjoining on top of relative clauses and gerund NPs respectively, and the displ-const feature 
which is used to simulate multi-component adjunction. 

Complex determiners such as genitives and partitives also anchor tree structures that adjoin 
to NP. They differ from the simple determiners in their internal complexity. Details of our 
treatment of these more complex constructions appear in Sections 18.3 and |18.4 . Sequences of 
determiners, as in the NPs all her dogs or those five dogs are derived by multiple adjunctions 
of the determiner tree, with each tree anchored by one of the determiners in the sequence. The 
order in which the determiner trees can adjoin is controlled by features. 



This treatment of determiners as adjoining onto NPs is similar to that of | Abeille, 1990 [, 
and allows us to capture one of the insights of the DP hypothesis, namely that determiners 



select NPs as complements. In Figure 18.2 the determiner and its NP complement appear in 



A more detailed discussion of this analysis can be found in Hockey and Mateyak, 1998 
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NP 



;] 

compl : <1> 
gen : <2> 
definite : <3> 
decreas : <4> 
quan : <5> 
const : <6> 
card : <7> 
conj : <8> 
pron : <9> 
wh : <10> 
case : <11> 
refl : <12> 
agr : <13> 



compl : <1> [] 
gen : <2> [] 
definite : <3> [] 
decreas : <4> [] 
quan : <5> [] 
const : <6> [] 
card : <7> [] 
conj : <8> [] 
pron : <9> [] 
wh : <10> [ ] 
case : <11> nom/acc 
refl : <12> [] 
agr : <13> [] 



Figure 18.1: NP Tree 



the configuration that is typically used in LTAG to represent selectional relationships. That 
is, the head serves as the anchor of the tree and it's complement is a sister node in the same 
elementary tree. 

The XTAG treatment of determiners uses nine features for representing their properties: 
definiteness (definite), quantity (quan), cardinality (card), genitive (gen), decreasing (de- 
creas), constancy (const), wh, agreement (agr), and complement (compl). Seven of these 
features were developed by semanticists for their accounts of semantic phenomena ( | Keenan 
and Stavi, 19861 , HI arwise and Cooper, 198l"[ , [ [Partee et a/., 1990[ |), another was developed for 
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NPrLJ 

wh : <1> [] 
decreas : <2> [] 
compl : <3> [] 
gen : <4> [] 
card : <5> [] 
quan : <6> [] 
definite : <7> [] 
const : <8> [] 
agr : <9> [] 
case : <10> nom/acc 
conj : <11> [] 
displ-const : <12> [] 




These 



Figure 18.2: Determiner Trees with Features 



a semantic account of determiner negation by one of the authors of this determiner analysis 
(| Mateyak, 1997| ), and the last is the familiar agreement feature. When used together these 
features also account for a substantial portion of the complex patterns of English determiner 
sequencing. Although we do not claim to have exhaustively covered the sequencing of deter- 
miners in English, we do cover a large subset, both in terms of the phenomena handled and 
in terms of corpus coverage. The XTAG grammar has also been extended to include complex 
determiner constructions such as genitives and partitives using these determiner features. 

Each determiner carries with it a set of values for these features that represents its own 
properties, and a set of values for the properties of NPs to which can adjoin. The features are 
crucial to ordering determiners correctly. The semantic definitions underlying the features are 
given below. 



Definiteness: Possible Values [+/-]. 
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A function f is definite iff f is non-trivial and whenever f(s) 7^ then it is always the 



intersection of one or more individuals. [ Keenan and Stavi, 1986 1 



Quantity: Possible Values [+/-]• 

If A and B are sets denoting an NP and associated predicate, respectively; E is a domain 
in a model M, and F is a bijection from Mi to M2, then we say that a determiner satisfies 
the constraint of quantity if Detg^AB <-> Det^ 2 F(A)F(B). [|Partee et al, 1990 ] 



Cardinality: Possible Values [+/-]. 

A determiner D is cardinal iff D £ cardinal numbers > 1. 



Genitive: Possible Values [+/-]. 

Possessive pronouns and the possessive morpheme (V) are marked gen+; all other nouns 
are gen—. 

Decreasing: Possible Values [+/-]. 

A set of Q properties is decreasing iff whenever s<t and t£Q then sGQ. A function f is 
decreasing iff for all properties f(s) is a decreasing set. 

A non-trivial NP (one with a Det) is decreasing iff its denotation in any model is decreas- 



ing. [Keenan and Stavi, 1986 



Constancy: Possible Values [+/-]• 

If A and B are sets denoting an NP and associated predicate, respectively, and E is 
a domain, then we say that a determiner displays constancy if (AuB) C E C E' then 
Det^AB <-> Det^'AB. Modified from flPartee et al, 1990|| 



Complement: Possible Values [+/-]• 

A determiner Q is positive complement if and only if for every set X, there exists a 
continuous set of possible values for the size of the negated determined set, NOT(QX), 
and the cardinality of QX is the only aspect of QX that can be negated, (adapted from 



iMateyak, 1997 1) 



The wh-feature has been discussed in the linguistics literature mainly in relation to wh- 
movement and with respect to NPs and nouns as well as determiners. We give a shallow but 
useful working definition of the wh-feature below: 



Wh: Possible Values [+/-]. 

Interrogative determiners are wh+; all other determiners are wh— . 



The agr feature is inherently a noun feature. While determiners are not morphologically 
marked for agreement in English many of them are sensitive to number. Many determiners 
are semantically either singular or plural and must adjoin to nouns which are the same. For 
example, a can only adjoin to singular nouns (a dog vs *a dogs while many must have plurals 
(many dogs vs *many dog). Other determiners such as some are unspecified for agreement in 
our analysis because they are compatible with either singulars or plurals (some dog, some dogs). 
The possible values of agreement for determiners are: [3sg, 3pl, 3]. 
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Table 18.1: Determiner Features associated with D anchors 
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The determiner tree in Figure 18.2 shows the appropriate feature values for the determiner 
these, while Table 18.1 shows the corresponding feature values of several other common deter- 
miners. 

In addition to the features that represent their own properties, determiners also have features 
to represent the selectional restrictions they impose on the NPs they take as complements. The 
selectional restriction features of a determiner appear on the N P footnode of the auxiliary 
tree that the determiner anchors. The NP f node in Figure 18. 2| shows the selectional feature 
restriction imposed by these^, while Table 18.2 shows the corresponding selectional feature 
restrictions of several other determiners. 



18.1 The Wh-Feature 



A determiner with a wh+ feature is always the left-most determiner in linear order since 
no determiners have selectional restrictions that allow them to adjoin onto an NP with a +wh 
feature value. The presence of a wh+ determiner makes the entire NP wh+, and this is correctly 
represented by the coindexation of the determiner and root NP nodes' values for the wh-feature. 
Wh+ determiners' selectional restrictions on the NP foot node of their tree only allows them 
adjoin onto NPs that are wh- or unspecified for the wh-feature. Therefore ungrammatical 
sequences such as *which what dog are impossible. The adjunction of wh + determiners onto 
wh-(- pronouns is also prevented by the same mechanism. 



18.2 Multi-word Determiners 



The system recognizes the multi-word determiners a few and many a. The features for a multi- 
word determiner are located on the parent node of its two components (see Figure 18.3 ). We 
chose to represent these determiners as multi-word constituents because neither determiner 
retains the same set of features as either of its parts. For example, the determiner a is 3sg and 
few is decreasing, while a few is 3pl and increasing. Additionally, many is 3pl and a displays 
constancy, but many a is 3sg and does not display constancy. Example sentences appear in 
(270)-(271). 



• Multi-word Determiners 



(270) a few teaspoons of sugar should be adequate . 



(271) many a man has attempted that stunt, but none have succeeded . 



2 We use the symbol UN to represent the fact that the selectional restrictions for a given feature are unspecified, 
meaning the noun phrase that the determiner selects can be either positive or negative for this feature. 
3 Except one which is 3sg. 
4 Except one which is compl-|-. 

5 A partitive can be either quan+ or quan-, depending upon the nature of the noun that anchors the partitive. 
If the anchor noun is modified, then the quantity feature is determined by the modifier's quantity value. 

6 In addition to this tree, these would also anchor another auxiliary tree that adjoins onto card+ determiners. 
7 one differs from the rest of CARD in selecting singular nouns 
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Table 18.2: Selectional Restrictions Imposed by Determiners on the NP foot node 
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Table 18.3: Selectional Restrictions Imposed by Groups of Determiners/Determiner Construc- 
tions 



' const : <5> 
definite : <6> 
quan : <7> 
card : <8> 
gen : <9> 
decreas : <10> 
wh: <11> 
agr : <4> 
wh: - 
quan : + 
predet : - 
gen : - 
definite : 
decreas : - 
card : - 




agr : 



3rdsing : 
pers : 3 
num: sing 



displ-const : <1> [ ] 
conj : <2> [ ] 
case : <3> nom/acc 
agr : <4> [ ] 
wh: - 
quan : - 
gen : - 
definite : - 
decreas : - 
const : - 
card : - 




Figure 18.3: Multi-word Determiner tree: /3DDnx 



18.3 Genitive Constructions 

There are two kinds of genitive constructions: genitive pronouns, and genitive NP's (which 
have an explicit genitive marker, 's, associated with them). It is clear from examples such as 
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her dog returned home and her five dogs returned home vs *dog returned home that genitive 
pronouns function as determiners and as such, they sequence with the rest of the determiners. 
The features for the genitives are the same as for other determiners. Genitives are not required 
to agree with either the determiners or the nouns in the NPs that they modify. The value of 
the agr feature for an NP with a genitive determiner depends on the NP to which the genitive 
determiner adjoins. While it might seem to make sense to take their as 3pl, my as lsg, and 
Alfonso's as 3sg, this number and person information only effects the genitive NP itself and 
bears no relationship to the number and person of the NPs with these items as determiners. 



Consequently, we have represented agr as unspecified for genitives in Table |18.1 . 

Genitive NP's are particularly interesting because they are potentially recursive structures. 
Complex NP's can easily be embedded within a determiner. 

(272) [[[JohnJ's friend from high school] 's uncle] 's mother came to town. 

There are two things to note in the above example. One is that in embedded NPs, the 
genitive morpheme comes at the end of the NP phrase, even if the head of the NP is at the 
beginning of the phrase. The other is that the determiner of an embedded NP can also be a 
genitive NP, hence the possibility of recursive structures. 

In the XTAG grammar, the genitive marker, 's, is separated from the lexical item that it 
is attached to and given its own category (G). In this way, we can allow the full complexity 
of NP's to come from the existing NP system, including any recursive structures. As with the 
simple determiners, there is one auxiliary tree structure for genitives which adjoins to NPs. As 



can be seen in |18.4j , this tree is anchored by the genitive marker 's and has a branching D node 
which accomodates the additional internal structure of genitive determiners. Also, like simple 
determiners, there is one initial tree structure (Figure 18. 5[) available for substitution where 



needed, as in, for example, the Determiner Gerund NP tree (see Chapter 17 for discussion on 
determiners for gerund NP's). 

Since the NP node which is sister to the G node could also have a genitive determiner in it, 
the type of genitive recursion shown in (272) is quite naturally accounted for by the genitive 
tree structure used in our analysis. 



18.4 Partitive Constructions 

The deciding factor for including an analysis of partitive constructions(e.g. some kind of, all 
of) as a complex determiner constructions was the behavior of the agreement features. If 
partitive constructions are analyzed as an NP with an adjoined PP, then we would expect to 
get agreement with the head of the NP (as in (273)). If, on the other hand, we analyze them 
as a determiner construction, then we would expect to get agreement with the noun that the 
determiner sequence modifies (as we do in (274)). 

(273) a kind [of these machines] is prone to failure. 



(274) [a kind of] these machines are prone to failure. 
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NP 



decreas : <2> [ ] 
gen : <3> [] 
card : <4> [ ] 
quan : <5> [ ] 
definite : <6> [] 
const : <7> [ ] 
agr : <1> 
wh : <8>[] 




decreas : <2> 
gen : <3> 
card : <4> 
quan : <5> 
deflnite : <6> 
const : <7> 
wh : <8> 
gen : <9> 
wh : <10>[] 



NP f [agr:<l>[] 

[] 



NPl [wh : <10> 

case : nom/acc 



GO 



igen 



<9>[] 



Igen : +1 



Figure 18.4: Genitive Determiner Tree 




Figure 18.5: Genitive NP tree for substitution: aDnxG 



In other words, for partitive constructions, the semantic head of the NP is the second rather 
than the first noun in linear order. That the agreement shown in (274) is possible suggests that 
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the second noun in linear order in these constructions should also be treated as the syntactic 
head. Note that both the partitive and PP readings are usually possible for a particular NP. In 
the cases where either the partitive or the PP reading is preferred, we take it to be just that, 
a preference, most appropriately modeled not in the grammar but in a component such as the 
heuristics used with the XTAG parser for reducing the analyses produced to the most likely. 



In our analysis the partitive tree in Figure 18.6 is anchored by one of a limited group of 



nouns that can appear in the determiner portion of a partitive construction. A rough semantic 
characterization of these nouns is that they either represent quantity (e.g. part, half, most, pot, 
cup, pound etc.) or classification (e.g. type, variety, kind, version etc.). In the absence of a 
more implementable characterization we use a list of such nouns compiled from a descriptive 



grammar Quirk et al, 1985], a thesaurus, and from online corpora. In our grammar the nouns 
on the list are the only ones that select the partitive determiner tree. 

Like other determiners, partitives can adjoin to an NP consisting of just a noun ('[a certain 
kind of] machine'), or adjoin to NPs that already have determiners ('[some parts of] these 
machines'. Notice that just as for the genitives, the complexity and the recursion are contained 
below the D node and rest of the structure is the same as for simple determiners. 

18.5 Adverbs, Noun Phrases, and Determiners 

Many adverbs interact with the noun phrase and determiner system in English. For example, 
consider sentences ( 275[) - (|282D below. 



(275) Approximately thirty people came to the lecture. 

(276) Practically every person in the theater was laughing hysterically during that scene. 

(277) Only John's crazy mother can make stuffing that tastes so good. 

(278) Relatively few programmers remember how to program in COBOL. 

(279) Not every martian would postulate that all humans speak a universal language. 

(280) Enough money was gathered to pay off the group gift. 

(281) Quite a few burglaries occurred in that neighborhood last year. 



(282) I wanted to be paid double the amount they offered. 
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NP r U 

NA decreas : <5> 
compl : <6> 
gen : <7> 
card : <8> 
quan : <9> 
definite : <10> 
const : <11> 
agr : <13> 
wh : <12> 
case : <14> 
conj : <15> 
displ-const : <16> 




decreas : <5> [] 
compl : <6> [] 
gen : <7> [] 
card : <8> [] 
quan : <9> [] 
definite : <10> [] 
const : <11> [] 
wh : <12> [] 
wh : <4>1 



NPf* rel-clause : - 
NA agr : <13> [] 

case : <14> nom/acc 
conj : <15> [] 
displ-const : <16> [] 




Figure 18.6: Partitive Determiner Tree 



Although there is some debate in the literature as to whether these should be classified as 
determiners or adverbs, we believe that these items that interact with the NP and determiner 
system are in fact adverbs. These items exhibit a broader distribution than either determiners 
or adjectives in that they can modify many other phrasal categories, including adjectives, verb 
phrases, prepositional phrases, and other adverbs. 
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Using the determiner feature system, we can obtain a close approximation to an accurate 
characterization of the behavior of the adverbs that interact with noun phrases and determiners. 
Adverbs can adjoin to either a determiner or a noun phrase (see figure 18.7[) , with the adverbs 
restricting what types of NPs or determiners they can modify by imposing feature requirements 
on the foot D or NP node. For example, the adverb approximately, seen in ( |275| ) above, selects 
for determiners that are card+. The adverb enough in fl280p is an example of an adverb that 
selects for a noun phrase, specifically a noun phrase that is not modified by a determiner. 





Ad Ny 



double D 



the N 



approximately thirty people 



(a) (b) 
Figure 18.7: (a) Adverb modifying a determiner; (b) Adverb modifying a noun phrase 



Most of the adverbs that modify determiners and NPs divide into six classes, with some 
minor variation within classes, based on the pattern of these restrictions. Three of the classes 
are adverbs that modify determiners, while the other three modify NPs. 

The largest of the five classes is the class of adverbs that modify cardinal determiners. This 
class includes, among others, the adverbs about, at most, exactly, nearly, and only. These ad- 
verbs have the single restriction that they must adjoin to determiners that are card+. Another 
class of adverbs consists of those that can modify the determiners every, all, any, and no. The 
adverbs in this class are almost, nearly, and practically. Closely related to this class are the ad- 
verbs mostly and roughly, which are restricted to modifying every and all, and hardly, which can 
only modify any. To select for every, all, and any, these adverbs select for determiners that are 
[quan-|-, card-, const+, compl-|-], and to select for no, the adverbs choose a determiner that 
is [quan+, decreas-)-, const-|-]. The third class of adverbs that modify determiners are those 
that modify the determiners few and many, representable by the feature sequences [quan-|-, 
decreas-|-, const-] and [quan-|-, decreas-, const-, 3pl, compl-|-], respectively. Examples of 
these adverbs are awfully, fairly, relatively, and very. 

Of the three classes of adverbs that modify noun phrases, one actually consists of a single 
adverb not, that only modifies determiners that are compl+. Another class consists of the 
focus adverbs, at least, even, only, and just. These adverbs select NPs that are wh- and card-. 
For the NPs that are card-|-, the focus adverbs actually modify the cardinal determiner, and so 
these adverbs are also included in the first class of adverbs mentioned in the previous paragraph. 
The last major class that modify NPs consist of the adverbs double and twice, which select NPs 
that are [definite+] (i.e., the, this/that/those/these, and the genitives). 
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Although these restrictions succeed in recognizing the correct determiner /ad verb sequences, 
a few unacceptable sequences slip through. For example, in handling the second class of adverbs 
mentioned above, every, all, and any share the features [quan+, card-, const+, compl+] 
with a and another, and so *nearly a man is acceptable in this system. In addition to this 
over-generation within a major class, the adverb quite selects for determiners and NPs in what 
seems to be a purely idiosyncratic fashion. Consider the following examples. 

(283) a. Quite a few members of the audience had to leave. 

b. There were quite many new participants at this year's conference. 

c. Quite few triple jumpers have jumped that far. 

d. Taking the day off was quite the right thing to do. 

e. The recent negotiation fiasco is quite another issue. 

f. Pandora is quite a cat! 

In examples ( |283| a)- (|283| c), quite modifies the determiner, while in (|283| d)-( |283| f), quite mod- 
ifies the entire noun phrase. Clearly, it functions in a different manner in the two sets of 
sentences; in ( |283| a)- (|283j c), quite intensifies the amount implied by the determiner, whereas in 
( |283j d)-( |283[ f), it singles out an individual from the larger set to which it belongs. To capture the 
selectional restrictions needed for (|283| a)-( f283| c), we utilize the two sets of features mentioned 
previously for selecting few and many. However, a few cannot be singled out so easily; using the 
sequence [quan+, card-, decreas-, const +, 3pl, compl-], we also accept the ungrammatical 
NPs * quite several members and * quite some members (where quite modifies some). In selecting 
the as in (d) with the features [definite-p-, gen-, 3sg], quite also selects this and that, which 
are ungrammatical in this position. Examples (|283j e) and ( 283| f) present yet another obstacle 



in that in selecting another and a, quite erroneously selects every and any. 

It may be that there is an undiscovered semantic feature that would alleviate these difficul- 
ties. However, on the whole, the determiner feature system we have proposed can be used as a 
surprisingly efficient method of characterizing the interaction of adverbs with determiners and 
noun phrases. 



Chapter 19 

Modifiers 



This chapter covers various types of modifiers: adverbs, prepositions, adjectives, and noun 
modifiers in noun- noun compounds .[] These categories optionally modify other lexical items 
and phrases by adjoining onto them. In their modifier function these items are adjuncts; they 
are not part of the subcategorization frame of the items they modify. Examples of some of 
these modifiers are shown in (284)-(286). 

(284) [adv certainly adv] > the October 13 sell-off didn't settle any stomachs . (WSJ) 



(285) Mr. Bakes [adv previously adv] had a turn at running Continental . (WSJ) 



(286) most [adj foreign adj] [n government jv] [n bond n] [prices] rose [pp during the week 
pp]- 

The trees used for the various modifiers are quite similar in form. The modifier anchors 
the tree and the root and foot nodes of the tree are of the category that the particular anchor 
modifies. Some modifiers, e.g. prepositions, select for their own arguments and those are also 
included in the tree. The foot node may be to the right or the left of the anchoring modifier 
(and its arguments) depending on whether that modifier occurs before or after the category it 
modifies. For example, almost all adjectives appear to the left of the nouns they modify, while 
prepositions appear to the right when modifying nouns. 



19.1 Adjectives 

In addition to being modifiers, adjectives in the XTAG English grammar can be also anchor 
clauses (see Adjective Small Clauses in Chapter [|). There is also one tree family, Intransitive 
with Adjective (TnxOVaxl), that has an adjective as an argument and is used for sentences 
such as Seth felt happy. In that tree family the adjective substitutes into the tree rather than 
adjoining as is the case for modifiers. 



As modifiers, adjectives anchor the tree shown in Figure 19.1. The features of the N node 



onto which the /3An tree adjoins are passed through to the top node of the resulting N. The 
1 Relative clauses are discussed in Chapter [l4| 
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gen : <1> 
definite : <2> 
decreas : <3> 
quan : <4> 
const : <5> 
card : <6> 
conj : <7> 
displ-const : <8> 
wh : <9> 
pron : <10> 
assign-comp : <11> 
agr : <12> 
case : <13> 




NA 



gen : <1> [] 
definite : <2> [] 
decreas : <3> [ ] 
quan : <4> [ ] 
const : <5> [ ] 
card : <6> [ ] 
conj : <7> [] 
displ-const : <8> [ ] 
wh : <9> [] 
pron : <10> [ ] 
assign-coinp : <11> [] 
agr:<12>[] 
case : <13> [ ] 



Figure 19.1: Standard Tree for Adjective modifying a Noun: /3An 



null adjunction marker (NA) on the N foot node imposes right binary branching such that each 
subsequent adjective must adjoin on top of the leftmost adjective that has already adjoined. 
Due to the NA constraint, a sequence of adjectives will have only one derivation in the XTAG 
grammar. The adjective's morphological features such as superlative or comparative are in- 
stantiated by the morphological analyzer. See Chapter [2^ for a description of how we handle 
comparatives. At this point, the treatment of adjectives in the XTAG English grammar does 
not include selectional or ordering restrictions. Consequently, any adjective can adjoin onto 
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any noun and on top of any other adjective already modifying a noun. All of the modified noun 
phrases shown in (287)-(290) currently parse with the same structure shown for colorless green 
ideas in Figure |19.2j . 

(287) big green bugs 



(288) big green ideas 



(289) colorless green ideas 



(290) ?green big ideas 



NP 



N r 



Nf 



NA 



colorless 



Nf 



NA 



green ideas 



Figure 19.2: Multiple adjectives modifying a noun 

While (288)-(290) are all semantically anomalous, (290) also suffers from an ordering prob- 
lem that makes it seem ungrammatical in the absence of any licensing context. One could 
argue that the grammar should accept (287)-(289) but not (290). One of the future goals for 
the grammar is to develop a treatment of adjective ordering similar to that developed by Hockey 
and Mateyak, 1998 1 for determiners^]. An adequate implementation of ordering restrictions for 
adjectives would rule out (290). 



19.2 Noun-Noun Modifiers 



Noun-noun compounding in the English XTAG grammar is very similar to adjective-noun 
modification. The noun modifier tree, shown in Figure 19.3, has essentially the same structure 
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N, 



assign-comp : <1> [ ] 

displ-const : <2> [ ] 

pron : <3> - 

wh : <4> [ ] 

agr : <5> [] 

case : <6> nom/acc 



NO 




case : nom/acc 
pron : - 



NA 



assign-comp : <1> 
displ-const : <2> 
wh : <4> 
agr : <5> 
case : <6> 
pron : <3> 



Figure 19.3: Noun-noun compounding tree: /3Nn (not all features displayed) 



as the adjective modifier tree in Figure 19.1, except for the syntactic category label of the 
anchor. 

Noun compounds have a variety of scope possibilities not available to adjectives, as illus- 
trated by the single bracketing possibility in (291) and the two possibilities for (292). This 
ambiguity is manifested in the XTAG grammar by the two possible adjunction sites in the 
noun-noun compound tree itself. Subsequent modifying nouns can adjoin either onto the N r 
node or onto the N anchor node of that tree, which results in exactly the two bracketing pos- 
sibilities shown in (292). This inherent structural ambiguity results in noun-noun compounds 
regularly having multiple derivations. However, the multiple derivations are not a defect in 
the grammar because they are necessary to correctly represent the genuine ambiguity of these 
phrases. 



(291) [at big [n green design 



AM AM 



(292) [at computer furniture design n]n] 
[n [n computer furniture at] design jv] 



2 See Chapter [L8| or [Hockey and Mateyak, 1998 for details of the determiner analysis. 
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Noun-noun compounds have no restriction on number. XTAG allows nouns to be either 
singular or plural as in (293)- (295). 

(293) Hyun is taking an algorithms course . 

(294) waffles are in the frozen foods section . 

(295) I enjoy the dog shows . 

19.3 Time Noun Phrases 

Although in general NPs cannot modify clauses or other NPs, there is a class of NPs, with 
meanings that relate to time, that can do so.0 We call this class of NPs "Time NPs" . Time NPs 
behave essentially like PPs. Like PPs, time NPs can adjoin at four places: to the right of an 
NP, to the right and left of a VP, and to the left of an S. 

Time NPs may include determiners, as in this month in example (296), or may be single 
lexical items as in today in example (297). Like other NPs, time NPs can also include adjectives, 
as in example (301). 

(296) Elvis left the building this week 

(297) Elvis left the building today 

(298) It has no bearing on our work force today (WSJ) 

(299) The fire yesterday claimed two lives 

(300) Today it has no bearing on our work force 

(301) Michael late yesterday announced a buy-back program 

The XTAG analysis for time NPs is fairly simple, requiring only the creation of proper NP 
auxiliary trees. Only nouns that can be part of time NPs will select the relevant auxiliary trees, 
and so only these type of NPs will behave like PPs under the XTAG analysis. Currently, about 
60 words select Time NP trees, but since these words can form NPs that include determiners 
and adjectives, a large number of phrases are covered by this class of modifiers. 

Corresponding to the four positions listed above, time NPs can select one of the four trees 



shown in Figure 19.4 



Determiners can be added to time NPs by adjunction in the same way that they are added 



to NPs in other positions. The trees in Figure 19.5 show that the structures of examples (296) 
and (297) differ only in the adjunction of this to the time NP in example (296). 
The sentence 



3 There may be other classes of NPs, such as directional phrases, such as north, south etc., which behave 
similarly. We have not yet analyzed these phrases. 
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VP 



VP 



NP 







NP S* 



NP VP 



N4 



VP NP 



NP- * NP 

N4. 



N> N> N> N> 

/3Ns /3Nvx /3vxN mx.X 

Figure 19.4: Time Phrase Modifier trees: /3Ns, /3Nvx, /3vxN, /3nxN 




Elvis 



D NPf this 

NA 



the N 



NPf 



NA 




Elvis V 



week 



the N 



building building 

Figure 19.5: Time NPs with and without a determiner 



(302) Esso said the Whiting field started production Tuesday (WSJ) 

has (at least) two different interpretations, depending on whether Tuesday attaches to said or 
to started. Valid time NP analyses are available for both these interpretations and are shown 



in Figure |19,6 . 

Derived tree structures for examples (298) - (301), which show the four possible time NP 
positions are shown in Figures 19.7 and |19.8| . The derivation tree for example (301) is also 
shown in Figure 19. 8j . 
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it V NI* N today N V NI» 




work force fire lives work force 



Figure 19.7: Time NPs in different positions (/3vxN, /3nxN and /3Ns) 

19.4 Prepositions 

There are three basic types of prepositional phrases, and three places at which they can adjoin. 
The three types of prepositional phrases are: Preposition with NP Complement, Preposition 
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NP 




Michael 



cmxOVnxl[announced] 



aNXN[Michael] (1) |3Nvx[yesterday] (2) 



cxNXN[program] (2.2) 



|3An[late] (1.1) 



pDnx[a] (0) pNn[buy-back] (1) 



late yesterday 



a 



buy-back program 



Figure 19.8: Time NPs: Derived tree and Derivation (/3Nvx position) 



with Sentential Complement, and Exhaustive Preposition. The three places are to the right of 
an NP, to the right of a VP, and to the left of an S. Each of the three types of PP can adjoin at 



each of these three places, for a total of nine PP modifier trees. Table 19.1 gives the tree family 
names for the various combinations of type and location. (See Section 23.4.2 for discussion of 
the /3spuPnx, which handles post-sentential comma-separated PPs.) 





position and category modified 




pre-sentential 


post-NP 


post- VP 


Complement type 


S modifier 


NP modifier 


VP modifier 


S-complement 


/3Pss 


/3nxPs 


/3vxPs 


NP-complement 


/3Pnxs 


/3nxPnx 


/3vxPnx 


no complement 
(exhaustive) 


/3Ps 


/3nxP 


/3vxP 



Table 19.1: Preposition Anchored Modifiers 



The subset of preposition anchored modifier trees in Figure 19.9 illustrates the locations and 
the four PP types. Example sentences using the trees in Figure 19.9 are shown in (303)-(306). 
There are also more trees with multi-word prepositions as anchors. Examples of these are: 
ahead of, contrary to, at variance with and as recently as. 
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VP, 




NA 






P NPl 



1' 



^vxP 



outside of 

/?vxPPnx 



Figure 19.9: Selected Prepositional Phrase Modifier trees: /3Pss, /3nxPnx, /3vxP and /3vxPPnx 

(303) [pp with Clove healthy pp], the veterinarian's bill will be more affordable . (/3PssQ) 

(304) The frisbee [pp in the brambles pp] was hidden . (/3nxPnx) 



(305) Clove played frisbee [pp outside pp] . (/3vxP) 



(306) Clove played frisbee [pp outside of the house pp] . (/3vxPPnx) 

Prepositions that take NP complements assign accusative case to those complements (see 



section 4.4.3.1 for details). Most prepositions take NP complements. Subordinating conjunc- 
tions are analyzed in XTAG as Preps (see Section |l^ for details). Additionally, a few non- 
conjunction prepositions take S complements (see Section ^8] for details). 



19.5 Adverbs 

In the English XTAG grammar, VP and S-modifying adverbs anchor the auxiliary trees /3ARBs, 
/3sARB, /3vxARB and /3ARBvx,Q allowing pre and post modification of S's and VP's. Besides 
the VP and S-modifying adverbs, the grammar includes adverbs that modify other categories. 
Examples of adverbs modifying an adjective, an adverb, a PP, an NP, and a determiner are 
shown in (307)- (314). (See Sections 23.1.5 and 23.4.1 for discussion of the /3puARBpuvx and 
/3spuARB, which handle pre-verbal parenthetical adverbs and post-sentential comma-separated 
adverbs.) 

• Modifying an adjective 
(307) extremely good 



4 Clove healthy is an adjective small clause 

5 In the naming conventions for the XTAG trees, ARB is used for adverbs. 



Because the letters in A, Ad, 
and Adv are all used for other parts of speech (adjective, determiner and verb), ARB was chosen to eliminate 
ambiguity. Appendix ^ contains a full explanation of naming conventions. 
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(308) rather tall 

(309) rich enough 

• Modifying an adverb 

(310) oddly enough 

(311) very well 

• Modifying a PP 

(312) right through the wall 

• Modifying a NP 

(313) quite some time 

• Modifying a determiner 

(314) exactly five men 

XTAG has separate trees for each of the modified categories and for pre and post modifi- 
cation where needed. The kind of treatment given to adverbs here is very much in line with 



the base-generation approach proposed by | Ernst, 1983 1, which assumes all positions where an 



adverb can occur to be base-generated, and that the semantics of the adverb specifies a range 
of possible positions occupied by each adverb. While the relevant semantic features of the 
adverbs are not currently implemented, implementation of semantic features is scheduled for 
future work. The trees for adverb anchored modifiers are very similar in form to the adjec- 
tive anchored modifier trees. Examples of two of the basic adverb modifier trees are shown in 



Figure 19.10. 
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Like the adjective anchored trees, these trees also have the NA constraint on the foot node to 
restrict the number of derivations produced for a sequence of adverbs. Features of the modified 
category are passed from the foot node to the root node, reflecting correctly that these types of 
properties are unaffected by the adjunction of an adverb. A summary of the categories modified 



and the position of adverbs is given in Table 19.2 . 

In the English XTAG grammar, no traces are posited for wh-adverbs, in-line with the 
base-generation approach ([ Ernst, 1983| ) for various positions of adverbs. Since convincing 
arguments have been made against traces for adjuncts of other types (e.g. Baltin, 1989| |), and 
since the reasons for wanting traces do not seem to apply to adjuncts, we make the general 
assumption in our grammar that adjuncts do not leave traces. Sentence initial wh-adverbs 
select the same auxiliary tree used for other sentence initial adverbs (/3ARBs) with the feature 
<wh>=-|-. Under this treatment, the derived tree for the sentence How did you fall? is as in 
Figure ( |19.11| ), with no trace for the adverb. 



s r 



inv : -] 
wh : <1> 
displ-const 
agr : <3> [] 
assign-case : 
mode : <5> [] 
tense : <6> [] 
assign-comp 
comp : <8> nil 



<2>[] 
<4>[] 

: <7> [] 



VP, 



passive : <1> 
displ-const : <2> 
assign-case : <3> 
assign-comp : <4> 
tense : <5> 
agr : <6> 
mode : <7> 



AdO [wh : <1> [J 

[] 



S* inv : <1> 

NA displ-const : <2> 
agr : <3> 

assign-case : <4> 
mode : <5> 
tense : <6> 
assign-comp : <7> 
comp : <8> 
sub-conj : nil 
sub-conj : nil 
comp : nil 



VP* 



NA 



passive : <1> [] 
displ-const : <2> [] 
assign-case : <3> [] 
assign-comp : <4> [] 
tense : <5> [] 
agr : <6> [] 
mode : <7> [] 



(a) (b) 

Figure 19.10: Adverb Trees for pre-modification of S: /3ARBs (a) and post-modification of a 
VP: /3vxARB (b) 
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Ad 



how V 



NA 



NA 



did NP VP r 



N V VP 



NA 



you £ V 



fall 



Figure 19.11: Derived tree for How did you fall? 
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invlink : <3> + 
inv : <3> 
wh : <1> 

displ-const : <4> [ ] 
agr : <5>[] 
assign-case : <6> [ ] 
mode : <7> [ ] 
tense : <8> [ ] 
assign-comp : <9> [ ] 
comp : <10> nil 



Ad r [ w h : <1> 
[wh : <2> +] 





S* inv : <3> 

NA displ-const : <4> 
agr : <5> 
assign-case : <6> 
mode : <7> 
tense : <8> 
assign-comp : <9> 
comp : <10> 
sub-conj : nil 
sub-conj : nil 
comp : nil 



AdO [ W h : <2>] Ad c i [] 

[] 



Figure 19.12: Complex adverb phrase modifier: /3ARBarbs 



180 



CHAPTER 19. MODIFIERS 



Category Modified 


Position with respect to item modified 


Pre 


Post 


S 


/3ARBs 


/3sARB 


VP 


/3ARBvx,/?puARBpuvx 


/3vxARB 


A 


/3ARBa 


/?aARB 


PP 


/?ARBpx 


/5pxARB 


ADV 


/3ARBarb 


/ferbARB 


NP 


/3ARBnx 




Det 


/JARBd 





Table 19.2: Simple Adverb Anchored Modifiers 



There is one more adverb modifier tree in the grammar which is not included in Table 19.2 



This tree, shown in Figure 19.12] , has a complex adverb phrase and is used for wh+ two-adverb 



phrases that occur sentence initially, such as in sentence (315). Since how is the only wh+ 
adverb, it is the only adverb that can anchor this tree. 

(315) how quickly did Srini fix the problem ? 

Focus adverbs such as only, even, just and at least are also handled by the system. Since the 
syntax allows focus adverbs to appear in practically any position, these adverbs select most of 



the trees listed in Table 19.2 . It is left up to the semantics or pragmatics to decide the correct 
scope for the focus adverb for a given instance. In terms of the ability of the focus adverbs 
to modify at different levels of a noun phrase, the focus adverbs can modify either cardinal 
determiners or noun-cardinal noun phrases, and cannot modify at the level of noun. The tree 
for adverbial modification of noun phrases is in shown Figure |19.13| (a). 

In addition to at least, the system handles the other two- word adverbs, at most and up 
to, and the three-word as-as adverb constructions, where an adjective substitutes between the 
two occurrences of as. An example of a three- word as-as adverb is as little as. Except for the 
ability of at least to modify many different types of constituents as noted above, the multi-word 
adverbs are restricted to modifying cardinal determiners. Example sentences using the trees in 



Figure 19.13 are shown in (316)-(320). 

• Focus Adverb modifying an NP 

(316) only a member of our crazy family could pull off that kind of a stunt . 



(317) even a flying saucer sighting would seem interesting in comparison 
with your story . 



(318) The report includes a proposal for at least a partial impasse in negotiations . 
• Multi-word adverbs modifying cardinal determiners 



(319) at most ten people came to the party 
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NP 




Ad) 

/3ARBnx 
(a) 





A4 I) 




P; At P 2 

/3PaPd 
(b) 



PO Ad> 



/?PARBd 

(c) 



Figure 19.13: Selected Focus and Multi-word Adverb Modifier trees: /3ARBnx, /JPARBd and 
/3PaPd 



(320) They gave monetary gifts of as little as five dollars . 



The grammar also includes auxiliary trees anchored by multi-word adverbs like a little, a 
bit, a mite, sort of, kind of, etc.. 

Multi-word adverbs like sort of and kind of can pre- modify almost any non-clausal category. 
The only strict constraint on their occurrence is that they can't modify nouns (in which case an 
adjectival interpretation would obtain)^. The category which they scope over can be directly 
determined from their position, except for when they occur sentence finally in which case they 
are assumed to modify VP's. The complete list of auxiliary trees anchored by these adverbs 
are as follows: /3NPax, /3NPpx, /3NPnx, /3NPvx, /3vxNP, /3NParb. Selected trees are shown in 



Figure 19.14, and some examples are given in (321)-(324). 






A4 Af * 



A4 Vf * 

NV Ptt 



Yf * A4 



N> PO 



N> PO 



N> PO 



/5NPax 
(a) 



/3NPvx 
(b) 



/3vxNP 
(c) 



Figure 19.14: Selected Multi-word Adverb Modifier trees (for adverbs like sort of, kind of): 
/3NPax, /3NPvx, Ax. XI'. 



(321) John is sort of [ap tired]. 

6 Note that there are semantic/lexical constraints even for the categories that these adverbs can modify, and 
no doubt invite a more in-depth analysis. 
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(322) John is sort of [pp to the right]. 



(323) John could have been sort of [yp eating the cake]. 



(324) John has been eating his cake sort of [adv slowly]. 

There are some multi-word adverbs that are, however, not so free in their distribution. 
Adverbs like a little, a bit, a mite modify AP's in predicative constructions (sentences with the 
copula and small clauses, AP complements in sentences with raising verbs, and AP's when they 
are subcategorized for by certain verbs (e.g., John felt angry). They can also post-modify VP's 
and PP's, though not as freely as AP's-]. Finally, they also function as downtoners for almost 
all adverbialsQ. Some examples are provided in (325)-(328). 

(325) Mickey is a little [ AP tired]. 



(326) The medicine [yp has eased John's pain] a little. 



(327) John is a little [pp to the right]. 



(328) John has been reading his book a little [adv loudly]. 



Following their behavior as described above, the auxiliary trees they anchor are /3DAax, 
/3DApx, /3vxDA, /3DAarb, /3DNax, /3DNpx, /3vxDN, /3DNarb. Selected trees are shown in 
Figure |19.15|). 



19.6 Locative Adverbial Phrases 

Locative adverbial phrases are multi-word adverbial modifiers whose meanings relate to spatial 
location. Locatives consist of a locative adverb (such as ahead or downstream) preceded by 
an NP, an adverb, or nothing, as in Examples (329)-(331) respectively. The modifier as a 
whole describes a position relative to one previously specified in the discourse. The nature of 
the relation, which is usually a direction, is specified by the anchoring locative adverb (behind, 
east). If an NP or a second adverb is present in the phrase, it specifies the degree of the relation 
(for example: three city blocks, many meters, and far). 

(329) The accident three blocks ahead stopped traffic 



7 They can also appear before NP's, as in, "John wants a little sugar". However, here they function as 
multi-word determiners and should not be analyzed as adverbs. 

8 It is to be noted that this analysis, which allows these multiword adverbs to modify adjectival phrases as 
well as adverbials, will yield (not necessarily desirable) multiple derivations for a sentence like John is a little 
unecessarily stupid. In one derivation, a little modifies the AP and in the other case, it modifies the adverb. 
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isa isa 



A4 Af * 

isa m 



pp, 




ISA NV 



I» 



0> 



/3vxDA 
(a) 



/3DAax 
(b) 



DO N> 



/3DNpx 
(c) 



Figure 19.15: Selected Multi-word Adverb Modifier trees (for adverbs like a little, a bit): /3vxDA, 
/3DAax, /D.Xpx. 



(330) The ship sank far offshore 



(331) The trouble ahead distresses me 

Locatives can modify NPs, VPs and Ss. They modify NPs only by right-adjoining post- 
positively, as in Example (329). Post-positive is also the more common position when a locative 
modifies either of the other categories. Locatives pre-modify VPs only when separated by 
balanced punctuation (commas or dashes). The trees locatives select when modifying NPs are 
shown in Figure 19.16| . 





AdvP 



NPi* 



AdvP 



NA 




NA 



NP 2 i 



AdO 



AdC 



Figure 19.16: Locative Modifier Trees: /3nxnxARB, /3nxARB 



When the locative phrase consists of only the anchoring locative adverb, as in Example (330), 
it uses the /mxARB tree, shown in Figure 19.16j , and its VP analogue, /3vxARB. In addition, 
these are the trees selected when the locative anchor is modified by an adverb expressing degree, 
as in Example 330. The degree adverb adjoins on to the anchor using the /3ARBarb tree, which 



is described in Section |19.5| . Figure |19.17| shows an example of these trees in action. Though 
there is a tree for a pre-sentential locative phrase, /JnxARBs, there is no corresponding post- 
sentential tree, as it is highly debatable whether the post-sentential version actually has the 
entire sentence or just the preceding verb phrase as its scope. Thus, in accordance with XTAG 
practice, which considers ambiguous post-sentential modifiers to be VP-modifiers rather than 



S-modifiers, there is only a /?vxnxARB tree, as shown in Figure 19.17 
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NA 




D NPf back 
NA 



I left D NPf three N 

NA 



my N 




my N 



toupee 



toupee 



Figure 19.17: Locative Phrases featuring NP and Adverb Degree Specifications 



One possible analysis of locative phrases with NPs might maintain that the NP is the head, 
with the locative adverb modifying the NP. This is initially attractive because of the similarity 
to time NPs, which also feature NPs that can modify clauses. This analysis seems insufficient, 
however, in light of the fact that virtually any NP can occur in locative phrases, as in example 
(332). Therefore, in the XTAG analysis the locative adverb anchors the locative phrase trees. 
A complete summary of all trees selected by locatives is contained in Table 19.3j . 26^] adverbs 
select the locative trees. 



(332) I left my toupee and putter three holes back 



Category Modified 


Degree Phrase Type 


NP 


Ad/None 


NP 


/mxnxARB 


/mxARB 


VP (post) 


/3vxnxARB 


/5vxARB 


VP (pre, punct-separated) 


/3punxARBpuvx 


/3puARBpuvx 


S 


/mxARBs 


/3ARBs 



Table 19.3: Locative Modifiers 



9 Though nearly all of these adverbs are spatial in nature, this number also includes a few temporal adverbs, 
such as ago, that also select these trees. 



Chapter 20 

Auxiliaries 



Although there has been some debate about the lexical category of auxiliaries, the English 
XTAG grammar follows | McCawley, 1988 1, | Haegeman, 1991 ], and others in classifying auxil- 
iaries as verbs. The category of verbs can therefore be divided into two sets, main or lexical 
verbs, and auxiliary verbs, which can co-occur in a verbal sequence. Only the highest verb in a 
verbal sequence is marked for tense and agreement regardless of whether it is a main or auxiliary 
verb. Some auxiliaries (be, do, and have) share with main verbs the property of having overt 
morphological marking for tense and agreement, while the modal auxiliaries do not. However, 
all auxiliary verbs differ from main verbs in several crucial ways. 



• Multiple auxiliaries can occur in a single sentence, while a matrix sentence may have at 
most one main verb. 



• Auxiliary verbs cannot occur as the sole verb in the sentence, but must be followed by a 
main verb. 



• All auxiliaries precede the main verb in verbal sequences. 

• Auxiliaries do not subcategorize for any arguments. 

• Auxiliaries impose requirements on the morphological form of the verbs that immediately 
follow them. 



• Only auxiliary verbs invert in questions (with the sole exception in American English of 
main verb b^j. 

• An auxiliary verb must precede sentential negation (e.g. *John not goes). 

• Auxiliaries can form contractions with subjects and negation (e.g. he'll, won't). 



The restrictions that an auxiliary verb imposes on the succeeding verb limits the sequence of 
verbs that can occur. In English, sequences of up to five verbs are allowed, as in sentence (333). 

Some dialects, particularly British English, can also invert main verb have in yes/no questions (e.g. have 
you any Grey Poupon ?). This is usually attributed to the influence of auxiliary have, coupled with the historic 
fact that English once allowed this movement for all verbs. 
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(333) The music should have been being played [for the president] . 

The required ordering of verb forms when all five verbs are present is: 

modal base perfective progressive passive 

The rightmost verb is the main verb of the sentence. While a main verb subcategorizes for the 
arguments that appear in the sentence, the auxiliary verbs select the particular morphological 
forms of the verb to follow each of them. The auxiliaries included in the English XTAG grammar 



are listed in Table 20.1 by type. The third column of Table 20.1 lists the verb forms that are 
required to follow each type of auxiliary verb. 



TYPE 


LEX ITEMS 


SELECTS FOR 


modals 


can, could, may, might, will, 


base form 2 




would, ought, shall, should 


(e.g. will go, might come) 




need 




perfective 


have 


past participle 
(e.g. has gone) 


progressive 


be 


gerund 
(e.g. is going, was coming) 


passive 


be 


past participle 
(e.g. was helped by Jane) 


do support 


do 


base form 
(e.g. did go, does come) 


infinitive to 


to 


base form 
(e.g. to go, to come) 



Table 20.1: Auxiliary Verb Properties 



20.1 Non-inverted sentences 

This section and the sections that follow describe how the English XTAG grammar accounts 
for properties of the auxiliary system described above. 



In our grammar, auxiliary trees are added to the main verb tree by adjunction. Figure 20.1 
shows the adjunction tree for non-inverted sentences.^ 

The restrictions outlined in column 3 of Table |20.1| are implemented through the features 
<mode>, <perfect>, <progressive> and <passive>. The syntactic lexicon entries for the 



auxiliaries gives values for these features on the foot node (VP*) in Figure 20.1. Since the top 



features of the foot node must eventually unify with the bottom features of the node it adjoins 



2 There are American dialects, particularly in the South, which allow double modals such as might could and 
might should. These constructions are not allo wed in the XTAG English grammar. 



3 We saw this tree briefly in section 4.4.3.2, but with most of its features missing. The full tree is presented 
here. 
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VP, 



;i 

assign-comp : <1> 
neg : <2> 
agr : <3> 
mainv : <4> 
tense : <5> 
mode : <6> 
assign-case : <7> 
displ-const : [ se tl : <8>] 
progressive : <9> 
perfect : <10> 
conditional : <11> 




VO 



assign-comp :<1>[] 
neg : <2> [] 
agr : <3>[] 
mainv : <4> [ ] 
tense : <5> [ ] 
mode : <6> [ ] 
assign-case : <7> [ ] 
displ-const : [setl :<&>[]] 



VP* 



displ-const : [ se tl : -] 
NA progressive : <9> [ ] 
perfect : <10> [] 
conditional : <11> [] 

[] 



Figure 20.1: Auxiliary verb tree for non-inverted sentences: /3Vvx 



onto for the sentence to be valid, this enforces the restrictions made by the auxiliary node. In 
addition to these feature values, each auxiliary also gives values to the anchoring node (Vo), to 
be passed up the tree to the root VP (VP r ) node; there they will become the new features for 
the top VP node of the sentential tree. Another auxiliary may now adjoin on top of it, and so 
forth. These feature values thereby ensure the proper auxiliary sequencing. Figure 20.2 shows 
the auxiliary trees anchored by the four auxiliary verbs in sentence (333). Figure |20.3 shows 
the final tree created for this sentence. 

The general English restriction that matrix clauses must have tense (or be imperatives) is 
enforced by requiring the top S-node of a sentence to have <mode>=ind/imp (indicative or 
imperative). Since only the indicative and imperative sentences have tense, non-tensed clauses 
are restricted to occurring in embedded environments. 

Noun-verb contractions are labeled NVC in their part-of-speech field in the morphological 
database and then undergo special processing to split them apart into the noun and the reduced 
verb before parsing. The noun then selects its trees in the normal fashion. The contraction, say 
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VP,[] 

conditional : <1> [] 
perfect : <2> [] 
progressive : <3> [] 
displ-const : [ se tl:<4>[] 
assign-case :<5>[] 
mai„v:<6>[] 
agr : <7> [] 
neg :<&>[] 
assign-comp : <9> [] 
tense : <10> pres 
mode : <11> ind 



VP r [] 




assign-comp : <9> 
neg : <S> 
agr : <7> 
mainv : <6> 
tense : <10> 
mode i <11> 
assign-case : <5> 
displ-const : [ se tl : <4>] 
displ-const : [ se tl : -1 



assign-comp : ind_nil/that/rcl/if/ whether 



displ-const : [setl : 
progressive : <3> 
perfect : <2> 
conditional : <1> 



tense : pres 
|mode | ind] 



conditional : <1> [ ] 
progressive : <2> [ ] 
displ-const : [setl : <3> []] 
assign-case : <4> [] 
mode : <5> [ ] 
tense : <6> [ ] 
agr : <7> [ ] 
neg : <8> [] 
assign-comp : <9> [ ] 
perfect : <10> + 
mainv : <11> - 



assign-comp : <9> 
neg : <8> 
agr : <7> 
mainv : <11> 
tense : <6> 
mode : <5> 
assign-case 
displ-const : 
displ-const : 
|mode : base" 




displ-const : [setl : 
NA progressive : <2> 
perfect : <10> 
conditional : <1> 
|mode : ppartj 



: <4> 
: [setl : 
: [setl : 



VP,[] 

conditional : <1> [] 
perfect : <2> [] 
displ-const : [ se tl : <3> [ ]] 
assign-case : <4> [] 
mode : <5> [] 
tense : <6> [] 
agr : <7>[] 
neg : <8> [] 
assign-comp : <9> [] 
progressive : <10> + 
mainv : <11> - 



VP,[] 



conditional : <1> [] 
perfect : <2> [ ] 
progressive : <3> [ ] 
displ-const : [setl : <4> []] 
assign-case : <5> [ ] 
mode : <6> [] 
tense : <7> [] 
agr : <8> [] 
neg : <9> [ ] 
assign-comp : <10> [] 
mainv : <11> - 




! assign-comp : <!*> 
neg : <S> 
agr : <7> 
mainv : <11> 
tense : <6> 
mode : <S> 
assign-case : <4> 
displ-const : [ se tl : <3>] 
displ-const : [setl : -] 
weak : - 
[mode : ppart 



displ-const : [ se tl : 
NA progressive : <10> 
perfect : <2> 
conditional : <1> 
[[mode : gerj 




|mode : gerj 



being 



Figure 20.2: Auxiliary trees for The music should have been being played 
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played 



Figure 20.3: The music should have been being played . 



'II or 'd, likewise selects the normal auxiliary verb tree, /3Vvx. However, since the contracted 
form, rather than the verb stem, is given in the morphology, the contracted form must also be 
listed as a separate syntactic entry. These entries have all the same features of the full form 
of the auxiliary verbs, with tense constraints coming from the morphological entry (e.g. it's is 
listed as it 's NVC 3SG PRES). The ambiguous contractions 'd (had/would) and 's (has/is) 
behave like other ambiguous lexical items; there are simply multiple entries for those lexical 
items in the lexicon, each with different features. In the resulting parse, the contracted form is 
shown with features appropriate to the full auxiliary it represents. 



20.2 Inverted Sentences 



In inverted sentences, the two trees shown in Figure 20.4| adjoin to an S tree anchored by a 
main verb. The tree in Figure 20.4| (a) is anchored by the auxiliary verb and adjoins to the S 
node, while the tree in Figure 20.4| (b) is anchored by an empty element and adjoins at the VP 
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node. Figure p0.5 shows these trees (anchored by wilt) adjoined to the declarative transitive 
tree0 (anchored by main verb buy). 



vp,.[] 



vo 



displ-const : [setl 

neg : <1> 
agr : <2> 
tense : <3> 
progressive : <7> 
perfect : <8> 
conditional : <9> 
assign-case : <5> 
mode : <4> 




neg : <1>[] 
agr : <2> [] 
tense : <3> [ ] 
mode : <4> ind 
assign-case : <5> [ ] 
agr : <6> []] 



S* progressive : <7> [ ] 
NA perfect : <8> [ ] 
conditional : <9> [ ] 
assign-case : <5> 
agr : <6> 

displ-const : [setl : 

comp : nil 
inv : - 



assign-comp 
neg : <10> 
agr : <4> 
mainv : <12> 
tense : <9> 
mode : <6> 
assign-case : 
displ-const : 

displ-const : 



conditional : <1> 
perfect : <2> 
progressive : <3> 
displ-const : [setl:<7>[] 
assign-case : <8> [ ] 
tense : <9> [ ] 
neg : <10> [] 



assign-comp 
agr : <4> 
passive : <5> 
mode : <6> 
mainv : <12> 



<u>[] 



<8> 

[setl : <7>] 
[setl : J 




NA 



conditional : <1> [] 
perfect : <2> [ ] 
progressive : <3> [] 
displ-const : [setl : 
agr : <4> [] 
passive : <5> [ ] 
mode : <6> [ ] 

[] 



(a) (b) 
Figure 20.4: Trees for auxiliary verb inversion: /TVs (a) and /3Vvx (b) 



The feature <displ-const> ensures that both of the trees in Figure 20.4 must adjoin to 
an elementary tree whenever one of them does. For more discussion on this mechanism, which 



simulates tree local multi-component adjunction, see [Hockey and Srinivas, 1993|. The tree in 
Figure 20.4 (b), anchored by e, represents the originating position of the inverted auxiliary. Its 
adjunction blocks the <assign-case> values of the VP it dominates from being co-indexed 
with the <case> value of the subject. Since <assign-case> values from the VP are blocked, 
the <case> value of the subject can only be co-indexed with the <assign-case> value of 



the inverted auxiliary (Figure |20.4 (a)). Consequently, the inverted auxiliary functions as the 
case-assigner for the subject in these inverted structures. This is in contrast to the situation in 
uninverted structures where the anchor of the highest (leftmost) VP assigns case to the subject 



4 The declarative transitive tree was seen in section 



5.2 
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Figure 20.5: will John buy a backpack 



(see section 4.4.3.2 for more on case assignment). The XTAG analysis is similar to GB accounts 



where the inverted auxiliary plus the e-anchored tree are taken as representing I to C movement. 

20.3 Do-Support 

It is well-known that English requires a mechanism called 'do-support' for negated sentences 
and for inverted yes-no questions without auxiliaries. 

(334) John does not want a car . 

(335) *John not wants a car . 

(336) John will not want a car . 

(337) Do you want to leave home ? 

(338) *want you to leave home ? 



(339) 



will you want to leave home ? 
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20.3.1 In negated sentences 



The GB analysis of do-support in negated sentences hinges on the separation of the INFL and 
VP nodes (see [Chomsky, 1965 1 , |Jackendoff, 1972 and [ Chomsky, 1986| ). The claim is that 
the presence of the negative morpheme blocks the main verb from getting tense from the INFL 
node, thereby forcing the addition of a verbal lexeme to carry the inflectional elements. If 
an auxiliary verb is present, then it carries tense, but if not, periphrastic or 'dummy', do is 
required. This seems to indicate that do and other auxiliary verbs would not co-occur, and 
indeed this is the case (see sentences (340)-(341)). Auxiliary do is allowed in English when no 
negative morpheme is present, but this usage is marked as emphatic. Emphatic do is also not 
allowed to co-occur with auxiliary verbs (sentences (342)-(345)). 



(340) *We will have do bought a sleeping bag . 



(341) *We do will have bought a sleeping bag . 



(342) You do have a backpack, don't you ? 



(343) I do want to go ! 



(344) *You do can have a backpack, don't you ? 



(345) *I did have had a backpack ! 

At present, the XTAG grammar does not have analyses for emphatic do. 

In the XTAG grammar, do is prevented from co-occurring with other auxiliary verbs by a 
requirement that it adjoin only onto main verbs (<mainv> = +). It has indicative mode, so 
no other auxiliaries can adjoin above it.Q The lexical item not is only allowed to adjoin onto a 
non-indicative (and therefore non-tensed) verb. Since all matrix clauses must be indicative (or 
imperative), a negated sentence will fail unless an auxiliary verb, either do or another auxiliary, 
adjoins somewhere above the negative morpheme, not. In addition to forcing adjunction of an 
auxiliary, this analysis of not allows it freedom to move around in the auxiliaries, as seen in the 
sentences (346)- (349). 

(346) John will have had a backpack . 



(347) *John not will have had a backpack 

(348) John will not have had a backpack . 

(349) John will have not had a backpack . 



5 Earlier, we said that indicative mode carries tense with it. Since only the topmost auxiliary carries the tense, 
any subsequent verbs must not have indicative mode. 
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20.3.2 In inverted yes/no questions 

In inverted yes/no questions, do is required if there is no auxiliary verb to invert, as seen in 
sentences (337)- (339), replicated here as (350)- (352). 

(350) do you want to leave home ? 



(351) *want you to leave home ? 

(352) will you want to leave home ? 



(353) *do you will want to leave home ? 

In English, unlike other Germanic languages, the main verb cannot move to the beginning 
of a clause, with the exception of main verb be^\ In a GB account of inverted yes/no questions, 
the tense feature is said to be in C° at the front of the sentence. Since main verbs cannot move, 
they cannot pick up the tense feature, and do-support is again required if there is no auxiliary 
verb to perform the role. Sentence (353) shows that do does not interact with other auxiliary 
verbs, even when in the inverted position. 

In XTAG, trees anchored by a main verb that lacks tense are required to have an auxiliary 
verb adjoin onto them, whether at the VP node to form a declarative sentence, or at the S node 



to form an inverted question. Do selects the inverted auxiliary trees given in Figure 20.4, just 



as other auxiliaries do, so it is available to adjoin onto a tree at the S node to form a yes/no 



question. The mechanism described in section |20.3.1| prohibits do from co-occurring with other 
auxiliary verbs, even in the inverted position. 



20.4 Infinitives 

The infinitive to is considered an auxiliary verb in the XTAG system, and selects the auxiliary 
tree in Figure 20.1. To, like do, does not interact with the other auxiliary verbs, adjoining only 
to main verb base forms, and carrying infinitive mode. It is used in embedded clauses, both 
with and without a complementizer, as in sentences (354)- (356). Since it cannot be inverted, 
it simply does not select the trees in Figure 20.4. 

(354) John wants to have a backpack . 



(355) John wants Mary to have a backpack . 

(356) John wants for Mary to have a backpack . 

The usage of infinitival to interacts closely with the distribution of null subjects (PRO), and 
is described in more detail in section 

6 The inversion of main verb have in British English was previously noted. 
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20.5 Semi- Auxiliaries 

Under the category of semi-auxiliaries, we have placed several verbs that do not seem to closely 
follow the behavior of auxiliaries. One of these auxiliaries, dare, mainly behaves as a modal 
and selects for the base form of the verb. The other semi-auxiliaries all select for the infinitival 
form of the verb. Examples of this second type of semi-auxiliary are used to, ought to, get to, 
have to, and BE to. 

20.5.1 Marginal Modal dare 

The auxiliary dare is unique among modals in that it both allows DO-support and exhibits 
a past tense form. It clearly falls in modal position since no other auxiliary (except do) may 
precede it in linear ordeiQ. Examples appear below. 

(357) she dare not have been seen . 



(358) she does not dare succeed . 



(359) Jerry dared not look left or right . 



(360) only models dare wear such extraordinary outfits . 



(361) dare Dale tell her the secret ? 



(362) *Louise had dared not tell a soul . 

As mentioned above, auxiliaries are prevented from having DO-support within the XTAG 
system. To allow for DO-support in this case, we had to create a lexical entry for dare that 
allowed it to have the feature mainv-|- and to have base mode (this measure is what also 
allows dare to occur in double-modal sequences). A second lexical entry was added to handle 
the regular modal occurrence of dare. Additionally, all other modals are classified as being 
present tense, while dare has both present and past forms. To handle this behavior, dare was 
given similar features to the other modals in the morphology minus the specification for tense. 

7 Some speakers accept dare preceded by a modal, as in / might dare finish this report today. In the XTAG 
analysis, this particular double modal usage is accounted for. Other cases of double modal occurrence exist in 
some dialects of American English, although these are not accounted for in the system, as was mentioned earlier. 



20.5. SEMI-AUXILIARIES 



195 



20.5.2 Other semi-auxiliaries 

The other semi- auxiliaries all select for the infinitival form of the verb. Many of these auxiliaries 
allow for DO-support and can appear in both base and past participle forms, in addition to 
being able to stand alone (indicative mode). Examples of this type appear below. 

(363) Alex used to attend karate workshops . 

(364) Angelina might have used to believe in fate . 

(365) Rich did not used to want to be a physical therapist . 

(366) Mick might not have to play the game tonight . 

(367) Singer had to have been there . 

(368) Heather has got to finish that project before she goes insane . 

The auxiliaries ought to and BE to may not be preceded by any other auxiliary. 

(369) Biff ought to have been working harder . 

(370) *Carson does ought to have been working harder . 

(371) the party is to take place this evening . 

(372) *the party had been to take place this evening . 

The trickiest element in this group of auxiliaries is used to. While the other verbs behave 
according to standard inflection for auxiliaries, used to has the same form whether it is in mode 
base, past participle, or indicative forms. The only connection used to maintains with the 
infinitival form use is that occasionally, the bare form use will appear with DO-support. Since 
the three modes mentioned above are mutually exclusive in terms of both the morphology and 
the lexicon, used has three entries in each. 
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20.5.3 Other Issues 

There is a lingering problem with the auxiliaries that stems from the fact that there currently 
is no way to distinguish between the main verb and auxiliary verb behaviors for a given letter 
string within the morphology. This situation results in many unacceptable sentences being 
successfully parsed by the system. Examples of the unacceptable sentences are given below. 

(373) the miller cans tell a good story . (vs the farmer cans peaches in July .) 

(374) David wills have finished by noon . (vs the old man wills his fortune to me .) 

(375) Sarah needs not leave . (vs Sarah needs to leave .) 

(376) Jennifer dares not be seen . (vs the young woman dares him to do the stunt .) 

(377) Lila does use to like beans . (vs Lila does use her new cookware .) 



Chapter 21 

Conjunction 



21.1 Introduction 



The XTAG system can handle sentences with conjunction of two constituents of the same 
syntactic category. The coordinating conjunctions which select the conjunction trees are and, 
or and 6ittQ There are also multi-word conjunction trees, anchored by either-or, neither-nor, 
both-and, and as well as. There are eight syntactic categories that can be coordinated, and in 
each case an auxiliary tree is used to implement the conjunction. These eight categories can 
be considered as four different cases, as described in the following sections. In all cases the 
two constituents are required to be of the same syntactic category, but there may also be some 
additional constraints, as described below. 



21.2 Adjective, Adverb, Preposition and PP Conjunction 

Each of these four categories has an auxiliary tree that is used for conjunction of two constituents 
of that category. The auxiliary tree adjoins into the left-hand-side component, and the right- 
hand-side component substitutes into the auxiliary tree. 

Figure 21.1| (a) shows the auxiliary tree for adjective conjunction, and is used, for example, 



in the derivation of the parse tree for the noun phrase the dark and dreary day, as shown in 
Figure 21.1| (b). The auxiliary tree adjoins onto the node for the left adjective, and the right 



adjective substitutes into the right hand side node of the auxiliary tree. The analysis for adverb, 
preposition and PP conjunction is exactly the same and there is a corresponding auxiliary tree 
for each of these that is identical to that of Figure 21.1| (a) except, of course, for the node labels. 



21.3 Noun Phrase and Noun Conjunction 

The tree for NP conjunction, shown in Figure |21.2| (a), has the same basic analysis as in the 
previous section except that the <wh> and <case> features are used to force the two noun 
phrases to have the same <wh> and <case> values. This allows, for example, he and she 
wrote the book together while disallowing *he and her wrote the book together. Agreement is 

1 We believe that the restriction of but to conjoining only two items is a pragmatic one, and our grammars 
accepts sequences of any number of elements conjoined by but. 
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NP 



DetP 





A,* 



the A> Conj A day 



ConjO A 2 i dark and dreary 

(a) (b) 
Figure 21.1: Tree for adjective conjunction: /5alCONJa2 and a resulting parse tree 



lexicalized, since the various conjunctions behave differently. With and, the root <agr num> 
value is <plural>, no matter what the number of the two conjuncts. With or, however, the root 
<agr num> is co-indexed with the <agr num> feature of the right conjunct. This ensures 
that the entire conjunct will bear the number of both conjuncts if they agree (Figure 21.2| (b)), 
or of the most "recent" one if they differ (Either the boys or John is going to help you.). There 
is no rule per se on what the agreement should be here, but people tend to make the verb agree 
with the last conjunct (cf. |Quirk et al, 1985 1, section 10.41 for discussion). The tree for N 
conjunction is identical to that for the NP tree except for the node labels. (The multi-word 
conjunctions do not select the N conjunction tree - *the both dogs and cats). 



21 A Determiner Conjunction 

In determiner coordination, all of the determiner feature values are taken from the left deter- 
miner, and the only requirement is that the <wh> feature is the same, while the other features, 
such as <card>, are unconstrained. For example, which and what and all but one are both 
acceptable determiner conjunctions, but *which and all is not. 

(378) how many and which people camp frequently ? 



(379) *some or which people enjoy nature . 



21.5 Sentential Conjunction 



The tree for sentential conjunction, shown in Figure 21.4 , is based on the same analysis as the 
conjunctions in the previous two sections, with a slight difference in features. The <mode> 
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NP[] 

decreas : <4> 
card : <5> 
quan : <6> 
definite : <7> 
gen : <8> 
const : <9> 
conj : <1> 
displ-const : <10> 
wh : <2> 
case : <3> 



NP 




case : <1> nom/acc 
wh : <2> - 
displ-const : <3> [] 
conj : <4> or 
const : <5> - 
gen : <6> - 
definite : <7> - 
quan : <8> - 
card : <9> - 
decreas : <10> - 
predet : <11> [] 
agr : [num : <12> plur] 



ConjyO [conj : <1> []] 

[] 



NP/* 



decreas : <4> [ ] 
card : <5> [] 
quan : <6> [ ] 
definite : <7> [ ] 
gen : <8> [] 
const : <9> [] 
displ-const : <10> [] 
wh : <2> 
case : <3> 



Conj 2 [] 

[] 



NP 2 4 |wh:<2>[] 

case : <3> nom/acc 




[] 



Conj, NP, Conj 2 NP 
NA 



either N or N 



(a) 



aardvarks 

(b) 



Figure 21.2: Tree for NP conjunction: /3CONJnxlCONJnx2 and a resulting parse tree 



feature^is used to constrain the two sentences being conjoined to have the same mode so that the 
day is dark and the phone never rang is acceptable, but *the day dark and the phone never rang 
is not. Similarly, the two sentences must agree in their <wh>, <comp> and <extracted> 
features. Co-indexation of the <comp> feature ensures that either both conjuncts have the 
same complementizer, or there is a single complementizer adjoined to the complete conjoined 
S. The <assign-comp> feature^] feature is used to allow conjunction of infinitival sentences, 
such as to read and to sleep is a good life. 



21.6 Comma as a conjunction 

We treat comma as a conjunction in conjoined lists. It anchors the same trees as the lexical 
conjunctions, but is considerably more restricted in how it combines with them. The trees 
anchored by commas are prohibited from adjoining to anything but another comma conjoined 
element or a non-coordinate element. (All scope possibilities are allowed for elements coordi- 
nated with lexical conjunctions.) Thus, structures such as Tree |21.5| (a) are permitted, with 
each element stacking sequentially on top of the first element of the conjunct, while structures 
such as Tree |21.5| (b) are blocked. 



See section i.c for an explanation of the <mode> feature. 
3 See section B.E for an explanation of the <assign-comp> feature. 



200 



CHAPTER 21. CONJUNCTION 



D[] 

conj : <1> 
wh : <2> 
const : <3> 
decreas : <4> 
gen : <5> 
card : <6> 
quan : <7> 
agr : <8> 
definite : <9> 
predet : <10> 
displ-const : <11> 



D, * wh : <2> 

NA const : <3> [] 
decreas : <4> [ ] 
gen : <5> [] 
card : <6> [ ] 
quan : <7> [ ] 
agr : <8> [] 
definite : <9> [] 
predet : <10> [ ] 
displ-const : <11> [] 



ConjO 



[conj 

[] 



:<!>[]] D 2 I [wh :<2>[]] 



Figure 21.3: Tree for determiner conjunction: /3dlCONJd2.ps 
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comp : <2> [ ] 

extracted : <3> [ ] 

wh : <4> [ ] 

conj : <1> 

displ-const : <5> [ ] 

mode : <6> ind/inf/ger/nom/prep/imp 



NA 




comp : <2> 
extracted : <3> 
wh : <4> 
displ-const : <5> 
assign-comp : inf_nil/ind_nil 
mode : <6> 



ConjO [conj : <1> []] 

[] 



comp : <2> 
extracted : <3> 
wh : <4> 

assign-comp : inf_nil/ind_nil 
mode : <6> 



Figure 21.4: Tree for sentential conjunction: /3slCONJs2 



NP 



NP 




NA 




beautiful , red 

(a) Valid tree with comma conjunction 

Figure 21.5: 



NA 



red , fragrant 

(b) Invalid tree 



This is accomplished by using the <conj> feature, which has the values and/or/but 
and comma to differentiate the lexical conjunctions from commas. The <conj> values for a 
comma- anchored tree and and-anchored tree are shown in Figure gLl| The feature <conj> 
= comma/none on A\ in (a) only allows comma conjoined or non-conjoined elements as the 
left-adjunct, and <conj> = none on A in (a) allows only a non-conjoined element as the right 
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conjunct. We also need the feature <conj> = and/or /but /none on the right conjunct of 
the trees anchored by lexical conjunctions like (b), to block comma-conjoined elements from 
substituting there. Without this restriction, we would get multiple parses of the NP in Tree 



21.5 ; with the restrictions we only get the derivation with the correct scoping, shown as (a). 

Since comma-conjoined lists can appear without a lexical conjunction between the final two 
elements, as shown in example (380), we cannot force all comma-conjoined sequences to end 
with a lexical conjunction. 

(380) So it is too with many other spirits which we all know: the spirit of Nazism or Com- 
munism, school spirit , the spirit of a street corner gang or a football team, the spirit of 
Rotary or the Ku Klux Klan. [Brown cdOl] 




neg : - 
conj : <14> comma 
displ-const : <15> [ ] 



NA 



wh : - 

displ-const : <15> 
neg : - 
conj : comma/none 



large 



NA 





neg : - 

conj : <14> and 
displ-const :<15>[] 



NA 




white 



A; wh : - 
NA neg : - 

displ-const : <15> 



red 



and 



A [wh 
neg : - 

conj : and/or/but/none 



white 



Figure 21.6: /3alCONJa2 (a) anchored by comma and (b) anchored by and 



21.7 But-not, not-but, and-not and e-not 

We are analyzing conjoined structures such as The women but not the men with a multi- anchor 
conjunction tree anchored by the conjunction plus the adverb not. The alternative is to allow 
not to adjoin to any constituent. However, this is the only construction where not can freely 
occur onto a constituent other than a VP or adjective (cf. /3NEGvx and /3NEGa trees). It can 
also adjoin to some determiners, as discussed in Section 111. We want to allow sentences like 



(381) and rule out those like (382). The tree for the good example is shown in Figure 21.7 . 
There are similar trees for and-not and e-not, where e is interpretable as either and or but, and 
a tree with not on the first conjunct for not-but. 

(381) Beth grows basil in the house (but) not in the garden . 



(382) *Beth grows basil (but) not in the garden 
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not 



Figure 21.7: Tree for conjunction with but-not: /3pxlCONJARBpx2 

Although these constructions sound a bit odd when the two conjuncts do not have the 
same number, they are sometimes possible. The agreement information for such NPs is always 
that of the non-negated conjunct: his sons, and not Bill, are in charge of doing the laundry 
or not Bill, but his sons, are in charge of doing the laundry (Some people insist on having the 
commas here, but they are frequently absent in corpus data.) The agreement feature from the 



non-negated conjunct in passed to the root NP, as shown in Figure 21.8. Aside from agreement, 
these constructions behave just like their non-negated counterparts. 

21.8 To as a Conjunction 



To can be used as a conjunction for adjectives (Fig. 21.9| ) and determiners, when they denote 
points on a scale: 

(383) two to three degrees 



(384) high to very high temperatures 

As far as we can tell, when the conjuncts are determiners they must be cardinal. 



21.9 Predicative Coordination 

This section describes the method for predicative coordination (including VP coordination of 
various kinds) used in XTAG. The description is derived from work described in ([ [Sarkar and 
Joshi, 1996| ). It is important to say that this implementation of predicative coordination is 
not part of the XTAG release at the moment due massive parsing ambiguities. This is partly 
because of the current implementation and also the inherent ambiguities due to VP coordination 
that cause a combinatorial explosion for the parser. We are trying to remedy both of these 
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NP 



neg : + 

wh : <1> [] 

case : <2> nom/acc 

displ-const : <3> [ ] 

conj : <4> but 

agr : [num : <5> []] 



NA 



Ad 




displ-const : <3> 
case : <2> 
wh : <1> 



NP,* 



NA 



displ-const 
wh : <1> 

case : <2> 



<3> 



but 



wh : <1> 
case : <2> 
agr : [num : <5>] 
conj : and/or/but/none 



not 



Figure 21.8: Tree for conjunction with not-but: /3ARBnxlCONJnx2 

limitations using a probability model for coordination attachments which will be included as 
part of a later XTAG release. 

This extended domain of locality in a lexicalized Tree Adjoining Grammar causes problems 
when we consider the coordination of such predicates. Consider (385) for instance, the NP the 
beans that I bought from Alice in the Right-Node Raising (RNR) construction has to be shared 
by the two elementary trees (which are anchored by cooked and ate respectively). 

(385) (((Harry cooked) and (Mary ate)) the beans that I bought from Alice) 



We use the standard notion of coordination which is shown in Figure 21.10 which maps two 
constituents of like type, but with different interpretations, into a constituent of the same type. 

We add a new operation to the LTAG formalism (in addition to substitution and adjunction) 
called conjoin (later we discuss an alternative which replaces this operation by the traditional 
operations of substitution and adjunction). While substitution and adjunction take two trees 
to give a derived tree, conjoin takes three trees and composes them to give a derived tree. 
One of the trees is always the tree obtained by specializing the schema in Figure |2TT0| for a 



particular category. The tree obtained will be a lexicalized tree, with the lexical anchor as the 
conjunction: and, but, etc. 

The conjoin operation then creates a contraction between nodes in the contraction sets of 
the trees being coordinated. The term contraction is taken from the graph-theoretic notion of 
edge contraction. In a graph, when an edge joining two vertices is contracted, the nodes are 
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NP 




Figure 21.9: Example of conjunction with to 



110 

x 

Figure 21.10: CoordjpatimJ^ch^i?t 

L.onj 

merged and the new vertex retains edges to the union of the neighbors of the merged vertices. 
The conjoin operation supplies a new edge between each corresponding node in the contraction 
set and then contracts that edge. 

For example, applying conjoin to the trees Conj(and), a(eats) and a(drinks) gives us the 
derivation tree and derived structure for the constituent in 386 shown in Figure |21.11 . 

(386) . . . eats cookies and drinks beer 

Another way of viewing the conjoin operation is as the construction of an auxiliary structure 
from an elementary tree. For example, from the elementary tree a(drinks), the conjoin oper- 
ation would create the auxiliary structure P(drinks) shown in Figure |21.12| . The adjunction 
operation would now be responsible for creating contractions between nodes in the contraction 
sets of the two trees supplied to it. Such an approach is attractive for two reasons. First, it 
uses only the traditional operations of substitution and adjunction. Secondly, it treats conj 
X as a kind of "modifier" on the left conjunct X. This approach reduces some of the parsing 
ambiguities introduced by the predicative coordination trees and forms the basis of the XTAG 
implement ation . 
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110 



Conj(and) yp 




Figure 21.11: An example pfthe! cohjgin operatiorMP^l} de»tearaJ shared depenYtency. 

VP and VP 
Conj(and) 



NP 




a (eats) 



2.2 



1] 



a (cookies) 

Derivation tree 



a (drinks) 

2.2 ' 
a (beer) 



eats cookies drinks 
Derived structure 



beer 




I drinks 
eats 
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More information about predicative coordination can be found in ( ||Sarkar and Joshi, 1996 ]), 
including an extension to handle gapping constructions. 

21.10 Pseudo-coordination 

The XTAG grammar does handle one sort of verb pseudo-coordination. Semi-idiomatic phrases 
such as 'try and' and 'up and' (as in 'they might try and come today') are handled as multi- 
anchor modifiers rather than as true coordination. These items adjoin to a V node, using the 
/JVCONJv tree. This tree adjoins only to verbs in their base morphological (non-inflected) form. 
The verb anchor of the /JVCONJv must also be in its base form, as shown in examples (387)- 
(389). This blocks 3rd-person singular derivations, which are the only person morphologically 
marked in the present, except when an auxiliary verb is present or the verb is in the infinitive. 

(387) *He tried and came yesterday. 

(388) They try and exercise three times a week. 

(389) He wants to try and sell the puppies. 



Chapter 22 

Comparatives 



22.1 Introduction 

Comparatives in English can manifest themselves in many ways, acting on many different 
grammatical categories and often involving ellipsis. A distinction must be made at the outset 
between two very different sorts of comparatives — those which make a comparison between two 
propositions and those which compare the extent to which an entity has one property to a 
greater or lesser extent than another property. The former, which we will refer to as proposi- 
tional comparatives, is exemplified in (390), while the latter, which we will call metalinguistic 
comparatives (following Hellan 1981), is seen in (391): 

(390) Ronaldo is more angry than Romario. 

(391) Ronaldo is more angry than upset. 

In (390), the extent to which Ronaldo is angry is greater than the extent to which Romario is 
angry. Sentence (391) indicates that the extent to which Ronaldo is angry is greater than the 
extent to which he is upset. 

Apart from certain of the elliptical cases, both kinds of comparatives can be handled straight- 
forwardly in the XTAG system. Elliptical cases which are not presently covered include those 
exemplified by the following sentences, which would presumably be handled in the same way 
as other sorts of VP ellipsis would. 

(392) Ronaldo is more angry than Romario is. 

(393) Bill eats more broccoli than George eats. 

(394) Bill eats more broccoli than George does. 

We turn to the analysis of metalinguistic comparatives first. 
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22.2 Metalinguistic Comparatives 

A metalinguistic comparison can be performed on basically all of the predicational categories — 
adjectives, verb phrases, prepositional phrases, and nouns — as in the following examples: 



(395) The table is more long than wide. (AP) 



(396) Clark more makes the rules than follows them. (VP) 



(397) Calvin is more in the living room than in the kitchen. (PP) 



(398) That unindentified amphibian in the bush is more frog than toad, I would say. (NP) 

At present, we only deal with the adjectival metalinguistic comparatives as in (395). The 
analysis given here for these can be easily extended to prepositional phrases and nominal com- 
paratives of the metalinguistic sort, but, as with coordination in XTAG, verb phrases will prove 
more difficult. 

Adjectival comparatives appear to distribute with simple adjectives, as in the following 
examples: 

(399) Herbert is more livid than angry. 



(400) Herbert is more livid and furious than angry. 



(401) The more innovative than conventional medication cured everyone in the sick ward. 



(402) The elephant, more wobbly than steady, fell from the circus ball. 



This patterning indicates that we can give these comparatives a tree that adjoins quite 



freely onto adjectives, as in Figure 22.1. This tree is anchored by more/less - than. To avoid 
grammatically incorrect comparisons such as more brighter than dark, the feature compar 
is used to block this tree from adjoining onto morphologically comparative adjectives. The 
foot node is compar-, while brighter and its comparative siblings are compar +n- We also 
wish to block strings like more brightest than dark, which is accomplished with the feature 
super, indicating superlatives. This feature is negative at the foot node so that /3ARBaPa 
cannot adjoin to superlatives like nicest, which are specified as super+ from the morphology. 
Furthermore, the root node is super-)- so that /3ARBaPa cannot adjoin onto itself and produce 
monstrosities such as (403): 

(403) *Herbert is more less livid than angry than furious. 

1 The analysis given later for adjectival propositional comparatives produces aggregated compar+ adjectives 
such as more bright, which will also be incompatible (as desired) with /3ARBaPa. 
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Figure 22.1: Tree for Metalinguistic Adjective Comparative: /3ARBaPa 

Thus, the use of the super feature is less to indicate superlativeness specifically, but rather to 
indicate that the subtree below a super+ node contains a full-fleshed comparison. In the case 
of lexical superlatives, the comparison is against everything, implicitly. 

A benefit of the multiple-anchor approach here is that we will never allow sentences such 
as (404), which would be permissible if we split the comparative component and the than 
component of metalinguistic comparatives into two separate trees. 

(404) *Ronaldo is angrier than upset. 

We also see another variety of adjectival comparatives of the form more/less than X, which 
indicates some property which is more or less extreme than the property X. In a sentence such 
as (405), some property is being said to hold of Francis such that it is of a kind with stupid and 
that it exceeds stupid on some scale (intelligence, for example). Quirk et al. also note that these 
constructions remark on the inadequacy of the lexical item. Thus, in (404), it could be that 
stupid is a starting point from which the speaker makes an approximation for some property 
which the speaker feels is beyond the range of the English lexicon, but which expresses the 
supreme lack of intellect of the individual it is predicated of. 

(405) Francis is more than stupid. 

(406) Romario is more than just upset. 

Taking our inspiration from /3ARBaPa, we can handle these comparatives, which have the 
same distribution but contain an empty adjective, by using the tree shown in Figure 



This sort of metalinguistic comparative also occurs with the verb phrase, prepositional 
phrase, and noun varieties. 

(407) Clark more than makes the rules. (VP) 

(408) Calvin's hands are more than near the cookie jar. (PP) 

(409) That stuff on her face is more than mud. (NP) 

Presumably, the analysis for these would parallel that for adjectives, though it has not yet been 
implemented. 
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A r 




AdO A PP 
NA 

e PO Af* 

Figure 22.2: Tree for Adjective-Extreme Comparative: /3ARBPa 

22.3 Propositional Comparatives 
22.3.1 Nominal Comparatives 

Nominal comparatives are considered here to be those which compare the cardinality of two 
sets of entities denoted by nominal phrases. The following data lay out a basic distribution of 
these comparatives. 

(410) More vikings than mongols eat spam. 

(411) *More the vikings than mongols eat spam. 

(412) Vikings eat less spaghetti than spam. 

(413) More men that walk to the store than women who despise spam enjoyed the football 
game. 

(414) More men than James like scotch on the rocks. 

(415) Elmer knows fewer martians than rabbits. 

Looking at these examples, we are tempted to produce a tree for this construction that is 
similar to /3ARBaPa. However, it is quite common for the than portion of these comparatives 
to be left out, as in the following sentences: 

(416) More vikings eat spam. 




(417) Mongols eat less spam. 
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Furthermore, than NP cannot occur without more. These facts indicate that we can and should 
build up nominal comparatives with two separate trees. The first, which allows a comparative 
adverb to adjoin to a noun, is given in Figure 22.3j (a). The second is the noun-phrase modifying 
prepositional tree. The tree /3CARBn is anchored by more/less/fewer and /3CnxPnx is anchored 
by than. The feature compar is used to ensure that only one /3CARBn tree can adjoin to any 
given noun — its foot node is compar- and the root node is compar+. All nouns are compar-, 
and the compar value is passed up through all trees which adjoin to N or NP. In order to ensure 
that we do not allow sentences like * Vikings than mongols eat spam, the compar feature is 
used. The NP foot node of /JCnxPnx is compar+; thus, /3CnxPnx will adjoin only to NP's 
which have been already modified by /3CARBn (and thereby comparativized) . In this way, we 
capture sentences like (416) en route to deriving sentences like (410), in a principled and simple 
manner. 



equiv : <1> 
super : - 
compar : + 




NPf 



NPI 

(a) /JCARBn tree (b) /3CnxPnx tree 

Figure 22.3: Nominal comparative trees 
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Further evidence for this approach comes from comparative clauses which are missing the 
noun phrase which is being compared against something, as in the following: 



(418) The vikings ate more.^j 



(419) The vikings ate more than a boar|] 

Sometimes the missing noun refers to an entity or set available in the prior discourse, while at 
other times it is a reference to some anonymous, unspecified set. The former is exemplified in 
a mini-discourse such as the following: 



2 We ignore here the interpretation in which the comparison covers the eating event, focussing only on the one 
which the comparison involves the stuff being eaten. 

3 This sentence differs from the metalinguistic comparison That stuff on her face is more than mud in that 
it involves a comment on the quantity and/or type of the compared NP, whereas the other expresses that the 
property denoted by the compared noun is an inadequate characterization of the thing being described. 
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Calvin: "The mongols ate spam. 
Hobbes: "The vikings ate more.'' 



The latter can be seen in the following example: 



Calvin: "The vikings ate a a boar." 

Hobbes: "Indeed. But in fact, the vikings ate more than a boar. 



Since the lone comparatives more/less/ fewer have the same basic distribution as noun 
phrases, the tree in Figure 22.4 is employed to capture this fact. The root node of aCARB is 
compar+. Not only does this accord with our intuitions about what the compar feature is 
supposed to indicate, it also permits /mxPnx to adjoin, giving us strings such as more than NP 
for free. 




e 



Figure 22.4: Tree for Lone Comparatives: aCARB 

Thus, by splitting nominal comparatives into multiple trees, we make correct predictions 
about their distribution with a minimal number of simple trees. Furthermore, we now also get 
certain comparative coordinations for free, once we place the requirement that nouns and noun 
phrases must match for compar if they are to be coordinated. This yields strings such as the 
following: 

(420) Julius eats more grapes and fewer boars than avocados. 



(421) Were there more or less than fifty people (at the party)"' 



The structures are given in Figure |22.5| . Also, it will block strings like more men and women than 
children under the (impossible) interpretation that there are more men than children but the 
comparison of the quantity of women to children is not performed. Unfortunately, it will permit 
comparative clauses such as more grapes and fewer than avocados under the interpretation in 
which there are more grapes than avocados and fewer of some unspecified thing than avocados 
(see Figure |22.6| ) . 



One aspect of this analysis is that it handles the elliptical comparatives such as the following: 
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NP r 




NP PP 




NPi Conj NP P NP 

NA 



N r and N r than N 




Ad Nf Ad Nf avocados 



more grapes fewer boars 

NP r 




more e less e N Nf 

NA 

fifty people 

Figure 22.5: Comparative conjunctions. 

(422) Arnold kills more bad guys than Steven. 

In a sense, this is actually only simulating the ellipsis of these constructions indirectly However, 
consider the following sentences: 

(423) Arnold kills more bad guys than I do. 

(424) Arnold kills more bad guys than I. 
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NP r 




Ad Nf fewer e avocados 



NA 

more grapes 

Figure 22.6: Comparative conjunctions. 

(425) Arnold kills more bad guys than me. 

The first of these has a proverb phrase which has a nominative subject. If we totally drop the 
second verb phrase, we find that the second NP can be in either the nominative or the accusative 
case. Prescriptive grammars disallow accusative case, but it actually is more common to find 
accusative case — use of the nominative in conversation tends to sound rather stiff and unnatural. 
This accords with the present analysis in which the second noun phrase in these comparatives 
is the complement of than in /mxPnx, and receives its case-marking from than. This does 
mean that the grammar will not currently accept (424), and indeed such sentences will only 
be covered by an analysis which really deals with the ellipsis. Yet the fact that most speakers 
produce (425) indicates that some sort of restructuring has occured that results in the kind of 
structure the present analysis offers. 

There is yet another distributional fact which falls out of this analysis. When comparative 
or comparativized adjectives modify a noun phrase, they can stand alone or occur with a than 
phrase; furthermore, they are obligatory when a than-phrase is present. 

(426) Hobbes is a better teacher. 

(427) Hobbes is a better teacher than Bill. 

(428) A more exquisite horse launched onto the racetrack. 

(429) A more exquisite horse than Black Beauty launched onto the racetrack. 
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(430) *Hobbes is a teacher than Bill. 

Comparative adjectives such as better come from the lexicon as compar +. By having trees 
such as (3 An transmit the compar value of the A node to the root N node, we can signal to 
/3CnxPnx that it may adjoin when a comparative adjective has adjoined. An example of such 
an adjunction is given in Figure 22.7. Of course, if no comparative element is present in the 
lower part of the noun phrase, /3nxPnx will not be able to adjoin since nouns themselves are 
compar-. In order to capture the fact that a comparative element blocks further modification 
to N, pAn must only adjoin to N nodes which are compar- in their lower feature matrix. 




Figure 22.7: Adjunction of /3nxPnx to NP modified by comparative adjective. 

In order to obtain this result for phrases like more exquisite horse, we need to provide a 
way for more and less to modify adjectives without a than-cl&use as we have with /3ARBaPa. 
Actually, we need this ability independently for comparative adjectival phrases, as discussed in 
the next section. 



22.3.2 Adjectival Comparatives 

With nominal comparatives, we saw that a single analysis was amenable to both "pure" com- 
paratives and elliptical comparatives. This is not possible for adjectival comparatives, as the 
following examples demonstrate: 

(431) The dog is less patient. 



(432) The dog is less patient than the cat. 



(433) The dog is as patient. 
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(434) The dog is as patient as the cat. 



(435) The less patient dog waited eagerly for its master. 



(436) *The less patient than the cat dog waited eagerly for its master. 

The last example shows that comparative adjectival phrases cannot distribute quite as freely 
as comparative nominals. 

The analysis of elliptical comparative adjectives follows closely to that of comparative nom- 
inals. We build them up by first adjoining the comparative element to the A node, which then 
signals to the AP node, via the compar feature, that it may allow a i/ian-clause to adjoin. The 
relevant trees are given in Figure [22.8 , /JCARBa is anchored by more, less and as, and /3axPnx 
is anchored by both than and as. 



AP, 



AdO 




APf* 




NA PO NPi 

(a) /3CARBa tree (b) /3axPnx tree 

Figure 22.8: Elliptical adjectival comparative trees 

The advantages of this analysis are many. We capture the distribution exhibited in the 
examples given in (431) - (436). With /3CARBa, comparative elements may modify adjectives 
wherever they occur. However, than clauses for adjectives have a more restricted distribution 
which coincides nicely with the distribution of AP's in the XTAG grammar. Thus, by making 
them adjoin to AP rather than A, ill-formed sentences like (436) are not allowed. 

There are two further advantages to this analysis. One is that /3CARBa interacts with 
/3nxPnx to produce sequences like more exquisite horse than Black Beauty, a result alluded 
to at the end of Section 22.3.1. We achieve this by ensuring that the comparativeness of an 
adjective is controlled by a comparative adverb which adjoins to it. A sample derivation is 



given in Figure 22.9. The second advantage is that we get sentences such as (437) for free. 



(437) Hobbes is better than Bill. 

Since better comes from the lexicon as compar+ and this value is passed up to the AP node, 
/3axPnx can adjoin as desired, giving us the derivation given in Figure |22.10| . 

Notice that the root AP node of Figure 22.1C is compar-, so we are basically saying that 
strings such as better than Bill are not "comparative." This accords with our use of the compar 
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Figure 22.9: Comparativized adjective triggering /3CnxPnx. 



feature — a positive value for compar signals that the clause beneath it is to be compared 
against something else. In the case of better than Bill, the comparison has been fulfilled, so we 
do not want it to signal for further comparisons. A nice result which follows is that /3axPnx 
cannot adjoin more than once to any given AP spine, and we have no need for the NA constraint 
on the tree's root node. Also, this treatment of the comparativeness of various strings proves 
important in getting the coordination of comparative constructions to work properly. 

A note needs to be made about the analysis regarding the interaction of the equivalence 
comparative construction as ... as and the inequivalence comparative construction more/less 
... than. In the grammar, more, less, and as all anchor /3CARBa, and both than and as 
anchor /3axPnx. Without further modifications, this of course will give us sentences such as the 
following: 

(438) *?Hobbes is as patient than Bill. 



(439) *?Hobbes is more patient as Bill. 

Such cases are blocked with the feature equiv: more, less, fewer and than are equiv- while 
as (in both adverbial and prepositional uses) is equiv+. The prepositional trees then require 
that their P node and the node to which they are adjoining match for equiv. 

An interesting phenomena in which comparisons seem to be paired with an inappropriate 
as/than-cl&use is exhibited in (440) and (441). 
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AP r [compar : -] 
NA 




APf 



f wh : <1> - PP 
NA [compar : <2> +J A 




A compar : <2> P NP 
wh : <1> 



better 



than 



N 



Bill 



Figure 22.10: Adjunction of /3axPnx to comparative adjective. 

(440) Hobbes is as patient or more patient than Bill. 

(441) Hobbes is more patient or as patient as Bill. 

Though prescriptive grammars disfavor these sentences, these are perfectly acceptable. We can 
capture the fact that the as/than-cl&use shares the equiv value with the latter of the comparison 
phrases by passing the equiv value for the second element to the root of the coordination tree. 

22.3.3 Adverbial Comparatives 

The analysis of adverbial comparatives encouragingly parallels the analysis for nominal and 
elliptical adjectival comparatives — with, however, some interesting differences. Some examples 
of adverbial comparatives and their distribution are given in the following: 

(442) Albert works more quickly. 

(443) Albert works more quickly than Richard. 

(444) Albert works more. 
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(445) *Albert more works. 



(446) Albert works more than Richard. 



(447) Hobbes eats his supper more quickly than Calvin. 



(448) Hobbes more quickly eats his supper than Calvin. 



(449) *Hobbes more quickly than Calvin eats his supper. 



When more is used alone as an adverb, it must also occur after the verb phrase. Also, it 
appears that adverbs modified by more and less have the same distribution as when they are 
not modified. However, the than portion of an adverbial comparative is restricted to post verb 
phrase positions. 

The first observation can be captured by having more and less select only /3vxARB from the 
set of adverb trees. Comparativization of adverbs looks very similar to that of other categories, 
and we follow this trend by giving the tree in Figure 22.11| (a), which parallels the adjectival and 
nominal trees, for these instances. This handles the quite free distribution of adverbs which 
have been comparativized, while the tree in Figure 22.11(b), /JvxPnx, allows the than portion of 
an adverbial comparative to occur only after the verb phrase, blocking examples such as (449). 



VP r 




(a) /3CARBarb tree (b) /3vxPnx tree 

Figure 22.11: Adverbial comparative trees 

The usage of the compar feature parallels that of the adjectives and nominals; however, 
trees which adjoin to VP are compar- on their root VP node. In this way, /3vxPnx anchored 
by than or as (which must adjoin to a compar + VP) can only adjoin immediately above 
a comparative or comparativized adverb. This avoids extra parses in which the comparative 
adverb adjoins at a VP node lower than the than-cl&use. 

A final note is that as may anchor /3vxPnx non-comparatively, as in sentence (450). This 
means that there will be two parses for sentences such as (451). 
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(450) John works as a carpenter. 

(451) John works as quickly as a carpenter. 

This appears to be a legitimate ambiguity. One is that John works as quickly as a carpenter 
(works quickly) , and the other is that John works quickly when he is acting as a carpenter (but 
maybe he is slow when he acting as a plumber). 

22.4 Future Work 

• Interaction with determiner sequencing (e.g., several more men than women but not *every 
more men than women). 

• Handle sentential complement comparisons (e.g., Bill eats more pasta than Angus drinks 
beer) . 

• Add partitives. 

• Deal with constructions like as many and as much. 

• Look at so... as construction. 



Chapter 23 

Punctuation Marks 



Many parsers require that punctuation be stripped out of the input. Since punctuation is often 
optional, this sometimes has no effect. However, there are a number of constructions which 
must obligatorily contain punctuation and adding analyses of these to the grammar without 
the punctuation would lead to severe overgeneration. An especially common example is noun 
appositives. Without access to punctuation, one would have to allow every combinatorial 
possibility of NPs in noun sequences, which is clearly undesirable (especially since there is 
already unavoidable noun-noun compounding ambiguity). Aside from coverage issues, it is 
also preferable to take input "as is" and do as little editing as possible. With the addition 
of punctuation to the XTAG grammar, we need only do/assume the conversion of certain 
sequences of punctuation into the "British" order (this is discussed in more detail below in 
Section 23. 2j ). 

The XTAG POS tagger currently tags every punctuation mark as itself. These tags are all 
converted to the POS tag Punct before parsing. This allows us to treat the punctuation marks 
as a single POS class. They then have features which distinguish amongst them. Wherever 
possible we have the punctuation marks as anchors, to facilitate early filtering. 

The full set of punctuation marks is separated into three classes: balanced, separating 
and terminal. The balanced punctuation marks are quotes and parentheses, separating are 
commas, dashes, semi-colons and colons, and terminal are periods, exclamation points and 
question marks. Thus, the <punct> feature is complex (like the <agr> feature), yielding 
feature equations like <Punct bal = paren> or <Punct term = excl>. Separating and 
terminal punctuation marks do not occur adjacent to other members of the same class, but may 
occasionally occur adjacent to members of the other class, e.g. a question mark on a clause 
which is separated by a dash from a second clause. Balanced punctuation marks are sometimes 
adjacent to one another, e.g. quotes immediately inside of parentheses. The <punct> feature 
allows us to control these local interactions. 

We also need to control non-local interaction of punctuation marks. Two cases of this 
are so-called quote alternation, wherein embedded quotation marks must alternate between 
single and double, and the impossibility of embedding an item containing a colon inside of 
another item containing a colon. Thus, we have a fourth value for <punct>, <contains 
colon/dquote/etc. +/->, which indicates whether or not a constituent contains a particular 
punctuation mark. This feature is percolated through all auxiliary trees. Things which may 
not embed are: colons under colons, semi-colons, dashes or commas; semi-colons under semi- 
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colon or commas. Although it is rare, parentheses may appear inside of parentheses, say with 
a bibliographic reference inside a parenthesized sentence. 

23.1 Appositives, parent heticals and vocatives 

These trees handle constructions where additional lexical material is only licensed in conjunction 
with particular punctuation marks. Since the lexical material is unconstrained (virtually any 
noun can occur as an appositive), the punctuation marks are anchors and the other nodes are 
substitution sites. There are cases where the lexical material is restricted, as with parenthetical 
adverbs like however, and in those cases we have the adverb as the anchor and the punctuation 
marks as substitution sites. 

When these constructions can appear inside of clauses (non-peripherally), they must be sep- 
arated by punctuation marks on both sides. However, when they occur peripherally they have 
either a preceding or following punctuation mark. We handle this by having both peripheral 
and non-peripheral trees for the relevant constructions. The alternative is to insert the second 
(following) punctuation mark in the tokenization process (i.e. insert a comma before the period 
when an appositive appears on the last NP of a sentence). However, this is very difficult to do 
accurately. 

23.1.1 /frixPUnxPU 

The symmetric (non-peripheral) tree for NP appositives, anchored by: comma, dash or paren- 
theses. It is shown in Figure [23. l| anchored by parentheses. 

(452) The music here , Russell Smith's "Tetrameron " , sounded good . [Brown:cc09] 

(453) ...cost 2 million pounds (3 million dollars) 

(454) Sen. David Boren (D., Okla.)... 

(455) ...some analysts believe the two recent natural disasters - Hurricane Hugo and the San 
Francisco earthquake - will carry economic ramifications.... [WSJ] 

The punctuation marks are the anchors and the appositive NP is substituted. The appositive 
can be conjoined, but only with a lexical conjunction (not with a comma). Appositives with 
commas or dashes cannot be pronouns, although they may be conjuncts containing pronouns. 
When used with parentheses this tree actually presents an alternative rather than an appositive, 
so a pronoun is possible. Finally, the appositive position is restricted to having nominative or 
accusative case to block PRO from appearing here. 

Appositives can be embedded, as in (456), but do not seem to be able to stack on a single 
NP. In this they are more like restrictive relatives than appositive relatives, which typically can 
stack. 

(456) ...noted Simon Briscoe, UK economist for Midland Montagu, a unit of Midland Bank 
PLC. 
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Figure 23.1: The /mxPUnxPU tree, anchored by parentheses 
23.1.2 /friPUnxPU 

The symmetric (non-peripheral) tree for N-level NP appositives, is anchored by comma. The 
modifier is typically an address. It is clear from examples such as (457) that these are attached 
at N, rather than NP. Carrier is not an appositive on Menlo Park, as it would be if these 
were simply stacked appositives. Rather, Calif, modifies Menlo Park, and that entire complex 
is compounded with carrier, as shown in the correct derivation in Figure |23.2| . Because this 
distinction is less clear when the modifier is peripheral (e.g. ends the sentence), and it would be 
difficult to distinguish between NP and N attachment, we do not currently allow a peripheral 
N-level attachment. 

(457) An official at Consolidated Freightways Inc., a Menlo Park, Calif., less-than-truckload 
carrier , said... 



(458) Rep. Ronnie Flippo (D., Ala.), of the delegation, says... 



23.1.3 /frixPUnx 

This tree, which can be anchored by a comma, dash or colon, handles asymmetric (peripheral) 
NP appositives and NP colon expansions of NPs. Figure 23.3 shows this tree anchored by a 
dash and a colon. Like the symmetric appositive tree, /3nxPUnxpu, the asymmetric appositive 
cannot be a pronoun, while the colon expansion can. Thus, this constraint comes from the 
syntactic entry in both cases rather than being built into the tree. 



(459) the bank's 90% shareholder - Petroliam Nasional Bhd. [Brown] 
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Calif 

Figure 23.2: An N-level modifier, using the /mPUnx tree 

(460) ...said Chris Dillow, senior U.K. economist at Nomura Research Institute . 

(461) ...qualities that are seldom found in one work: Scrupulous scholarship, a fund of personal 
experience,... [Brown :cc06] 

(462) I had eyes for only one person : him . 

The colon expansion cannot itself contain a colon, so the foot S has the feature NP.t:< 
punctcontainscolon >= — . 



23.1.4 /^PUpxPUvx 



Tree for pre- VP parenthetical PP, anchored by commas or dashes - 



(463) John , in a fit of anger , broke the vase 



(464) Mary , just within the last year , has totalled two cars 



These are clearly not NP modifiers. 



Figures 23.4| and 23.5 show this tree alone and as part of the parse for (463). 
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bank 90% 



Naasional Bhd 




(a) 



(b) 



Figure 23.3: The derived trees for an NP with (a) a peripheral, dash-separated appositive and 
(b) an NP colon expansion (uttered by the Mouse in Alice 's Adventures in Wonderland) 




Figure 23.4: The /3PUpxPUvx tree, anchored by commas 
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Figure 23.5: Tree illustrating the use of /?PUpxPUvx 



23.1.5 /^puARBpuvx 

Parenthetical adverbs - however, though, etc. Since the class of adverbs is highly restricted, this 
tree is anchored by the adverb and the punctuation marks substitute. The punctuation marks 
may be either commas or dashes. Like the parenthetical PP above, these are clearly not NP 
modifiers. 

(465) The new argument over the notification guideline , however , could sour any atmosphere 
of cooperation that existed . [WSJ] 

23.1.6 /?sPUnx 

Sentence final vocative, anchored by comma: 

(466) You were there , Stanley/my boy . 

Also, when anchored by colon, NP expansion on S. These often appear to be extraposed 
modifiers of some internal NP. The NP must be quite heavy, and is usually a list: 



(467) Of the major expansions in 1960, three were financed under the R. I. Industrial Building 
Authority's 100% guaranteed mortgage plan: Collyer Wire, Leesona Corporation, and 
American Tube & Controls. 
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A simplified version of this sentence is shown in figure 23.6. The NP cannot be a pronoun 
in either of these cases. Both vocatives and colon expansions are restricted to appear on tensed 
clauses (indicative or imperative). 




Figure 23.6: A tree illustrating the use of sPUnx for a colon expansion attached at S. 



23.1.7 /mxPUs 

Tree for sentence initial vocatives, anchored by a comma: 

(468) Stanley/my boy , you were there . 

The noun phrase may be anything but a pronoun, although it is most commonly a proper 
noun. The clause adjoined to must be indicative or imperative. 

23.2 Bracketing punctuation 
23.2.1 Simple bracketing 

Trees: /3PUsPU, /3PUnxPU, /3PUnPU, /3PUvxPU, /3PUvPU, /3PUarbPU, /3PUaPU, /3PUdPU, 
/?PUpxPU, /3PUpPU 
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These trees are selected by parentheses and quotes and can adjoin onto any node type, whether 
a head or a phrasal constituent. This handles things in parentheses or quotes which are syn- 



tactically integrated into the surrounding context. Figure 23.7 shows the /3PUsPU anchored by 
parentheses, and this tree along with /3PUnxPU in a derived tree. 

(469) Dick Carroll and his accordion (which we now refer to as "Freida") held over at Bahia 
Cabana where "Sir" Judson Smith brings in his calypso capers Oct. 13 . [Brown:ca31] 

(470) ...noted that the term "teacher-employee" (as opposed to, e.g., "maintenance employee") 
was a not inapt description. [Brown:ca35] 



Punct 



( 




Punct 2 



NA 




(a) (b) 
Figure 23.7: /3PUsPU anchored by parentheses, and in a derivation, along with /3PUnxPU 

There is a convention in English that quotes embedded in quotes alternate between single 
and double; in American English the outermost are double quotes, while in British English they 
are single. The contains feature is used to control this alternation. The trees anchored by 
double quotation marks have the feature punct contains dquote = - on the foot node and 
the feature punct contains dquote = + on the root. All adjunction trees are transparent to 
the contains feature, so if any tree below the double quote is itself enclosed in double quotes 
the derivation will fail. Likewise with the trees anchored by single quotes. The quote trees in 
effect "toggle" the contains Xquote feature. Immediate proximity is handled by the punct 
balanced feature, which allows quotes inside of parentheses, but not vice-versa. 
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In addition, American English typically places/moves periods (and commas) inside of quo- 
tation marks when they would logically occur outside, as in example 471. The comma in the 
first part of the quote is not part of the quote, but rather part of the parenthetical quoting 
clause. However, by convention it is shifted inside the quote, as is the final period. British 
English does not do this. We assume here that the input has already been tokenized into the 
"British" format. 



(471) "You can't do this to us ," Diane screamed . "We are Americans. 



The /3PUsPU can handle quotation marks around multiple sentences, since the sPUs tree 
allows us to join two sentences with a period, exclamation point or question mark. Currently, 
however, we cannot handle the style where only an open quote appears at the beginning of a 
paragraph when the quotation extends over multiple paragraphs. We could allow a lone open 
quote to select the /5PUs tree, if this is deemed desirable. 

Also, the /3PUsPU is selected by a pair of commas to handle non-peripheral appositive 
relative clauses, such as in example (472). Restrictive and appositive relative clauses are not 
syntactically differentiated in the XTAG grammar (cf. Chapter 14). 



(472) This news , announced by Jerome Toobin , the orchestra's administrative director , 
brought applause ... [Brown :cc09] 

The trees discussed in this section will only allow balanced punctuation marks to adjoin to 
constituents. We will not get them around non-constituents, as in (473). 

(473) Mary asked him to leave (and he left) 
23.2.2 /fePUsPU 

This tree allows a parenthesized clause to adjoin onto a non-parenthesized clause. 



(474) Innumerable motels from Tucson to New York boast swimming pools ( " swim at your 
own risk " is the hospitable sign poised at the brink of most pools ) . [Brown:cal7] 



23.3 Punctuation trees containing no lexical material 
23.3.1 aPU 

This is the elementary tree for substitution of punctuation marks. This tree is used in the 
quoted speech trees, where including the punctuation mark as an anchor along with the verb 
of saying would require a new entry for every tree selecting the relevant tree families. It is also 
used in the tree for parenthetical adverbs (/3puARBpuvx), and for S-adjoined PPs and adverbs 
(/3spuARB and /3spuPnx). 
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23.3.2 (3PUs 

Anchored by comma: allows comma-separated clause initial adjuncts, (475-476). 

(475) Here , as in "Journal" , Mr. Louis has given himself the lion's share of the dancing... 
[Brown:cc09] 

(476) Choreographed by Mr. Nagrin, the work filled the second half of a program 

To keep this tree from appearing on root Ss (i.e. , sentence), we have a root constraint that 
<punct struct = nil> (similar to the requirement that root Ss be tensed, i.e. <mode = ind/imp>) 
The <punct struct> = nil feature on the foot blocks stacking of multiple punctuation marks. 
This feature is shown in the tree in Figure 23.8j . 



s r [] 

invlink : <1> [ ] 
inv : <1> 

punct : [struct : <2> [ J 
displ-const : <3> [ ] 
agr : <4> [] 
assign-case : <5> [] 
mode : <6> [ ] 
sub-conj : <7> [ ] 
extracted : <8> [ ] 
tense : <9> [ ] 
assign-comp : <10> [] 
comp : <11> [] 



Punct [punct : [struct : <2>] 

punct : [struct : comma] 




: <5> 



inv : <1> 

punct : [struct : nil| 
displ-const : <3> 
agr : <4> 
assign-case 
mode : <6> 
sub-conj : <7> 
extracted : <8> 
tense : <9> 
assign-comp : <10> 
comp : <11> 

[] 



Figure 23.8: /3PUs, with features displayed 
This tree can be also used by adjuncts on embedded clauses: 
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(477) One might expect that in a poetic career of seventy-odd years, some changes in style and 
method would have occurred, some development taken place. [Brown:cj65] 

These adjuncts sometimes have commas on both sides of the adjunct, or, like (477), only 
have them at the end of the adjunct. 

Finally, this tree is also used for peripheral appositive relative clauses. 

(478) Interest may remain limited into tomorrow's U.K. trade figures, which the market will 
be watching closely to see if there is any improvement after disappointing numbers in the 
previous two months. 

23.3.3 /M>Us 

This tree handles clausal "coordination" with comma, dash, colon, semi-colon or any of the 
terminal punctuation marks. The first clause must be either indicative or imperative. The 
second may also be infinitival with the separating punctuation marks, but must be indicative or 
imperative with the terminal marks; with a comma, it may only be indicative. The two clauses 
need not share the same mode. NB: Allowing the terminal punctuation marks to anchor this 
tree allows us to parse sequences of multiple sentences. This is not the usual mode of parsing; 
if it were, this sort of sequencing might be better handled by a higher level of processing. 

(479) For critics , Hardy has had no poetic periods - one does not speak of early Hardy or late 
Hardy , or of the London or Max Gate period.... 



(480) Then there was exercise , boating and hiking , which was not only good for you but also 
made you more virile : the thought of strenuous activity left him exhausted. 

This construction is one of the few where two non-bracketing punctuation marks can be 
adjacent. It is possible (if rare) for the first clause to end with a question mark or exclamation 
point, when the two clauses are conjoined with a semi-colon, colon or dash. Features on the 
foot node, as shown in Figure 23.9| , control this interaction. 



Complementizers are not permitted on either conjunct. Subordinating conjunctions some- 
times appear on the right conjunct, but seem to be impossible on the left: 

(481) Killpath would just have to go out and drag Gun back by the heels once an hour ; because 
he'd be damned if he was going to be a mid-watch pencil-pusher . [Brown:cll7] 



(482) The best rule of thumb for detecting corked wine (provided the eye has not already 
spotted it) is to smell the wet end of the cork after pulling it : if it smells of wine , 
the bottle is probably all right ; if it smells of cork , one has grounds for suspicion. 
[Brown:cf27] 
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s r [] 

wh : <1> [] 
displ-const : <2> [ ] 
agr : <3> [ ] 
assign-case : <4> [ ] 
mode : <5> ind/imp 
tense : <6> [ ] 
assign-coinp : <7> [ ] 
comp : <8> nil 
punct : [contains : <9> [ J 



mode : <5> 
NA comp : <8> 
assign-comp 
tense : <6> 
assign-case 
agr : <3> 
displ-const : <2> 
wh : <1> 
sub-conj : nil 




: <7> 
<4> 



punct : [contains 
punct : contains 
struct : 



: [scolon : 
scolon 



S/-J- coinp : nil 

punct : [struct i none 

[contains : [colon : 
mode : ind/imp/inf 



punct : 



struct : none 
term : excl/qmark 
contains : [colon : 



Figure 23.9: /?sPUs, with features displayed 



23.3.4 /3sPU 



This tree handles the sentence final punctuation marks when selected by a question mark, 
exclamation point or period. One could also require a final punctuation mark for all clauses, 
but such an approach would not allow non-periods to occur internally, for instance before a semi- 
colon or dash as noted above in Section 23.3.3| . This tree currently only adjoins to indicative 
or imperative (root) clauses. 

(483) He left ! 



(484) Get lost . 



(485) Get lost ? 

The feature punct bal= nil on the foot node ensures that this tree only adjoins inside 
of parentheses or quotes completely enclosing a sentence (486), but does not restrict it from 
adjoining to clause which ends with balanced punctuation if only the end of the clause is 
contained in the parentheses or quotes (487). 
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(486) (John then left .) 

(487) (John then left) . 

(488) Mary asked him to leave (immediately) . 

This tree is also selected by the colon to handle a colon expansion after adjunct clause - 

(489) Expressed differently : if the price for becoming a faithful follower... [Brown:cd02] 

(490) Expressing it differently : if the price for becoming a faithful follower... 

(491) To express it differently : if the price for becoming a faithful follower... [Brown:cd02] 

This tree is only used after adjunct (untensed) clauses, which adjoin to the tensed clause 
using the adjunct clause trees (cf Section [15] ); the mode of the complete clause is that of the 
matrix rather than the adjunct. Indicative or imperative (i.e. root) clauses separated by a 
colon use the /3sPUs tree (Section |23.3.3| ). 

23.3.5 /?vPU 

This tree is anchored by a colon or a dash, and occurs between a verb and its complement. 
These typically are lists. 

(492) Printed material Available , on request , from U.S. Department of Agriculture , Wash- 
ington 25 , D.C. , are : Cooperative Farm Credit Can Assist [BrownxhOl] 

23.3.6 /?pPU 

This tree is anchored by a colon or a dash, and occurs between a preposition and its complement. 
It typically occurs with a sequence of complements. As with the tree above, this typically occurs 
with a conjoined complement. 

(493) ...and utilization such as : (A) the protection of forage... 

(494) ...can be represented as : Af. 

23.4 Other trees 
23.4.1 /?spuARB 

In general, we attach post-clausal modifiers at the VP node, as you typically get scope ambiguity 
effects with negation (John didn't leave today - did he leave or not?). However, with post- 
sentential, comma-separated adverbs, there is no ambiguity - in John didn't leave, today he 
definitely did not leave. Since this tree is only selected by a subset of the adverbs (namely, 
those which can appear pre-sententially, without a punctuation mark), it is anchored by the 
adverb. 



(495) The names of some of these products don't suggest the risk involved in buying them , 
either . [WSJ] 



23.4. OTHER TREES 



235 



23.4.2 /^spuPnx 

Clause-final PP separated by a comma. Like the adverbs described above, these differ from VP 
adjoined PPs in taking widest scope. 

(496) ...gold for current delivery settled at $367.30 an ounce , up 20 cents . 

(497) It increases employee commitment to the company , with all that means for efficiency 
and quality control . 

23.4.3 /3nxPUa 

Anchored by colon or dash, allows for post-modification of NPs by adjectives. 

(498) Make no mistake , this Gorky Studio drama is a respectable import - aptly grave , 
carefully written , performed and directed . 



Part V 

Appendices 
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Appendix A 

Future Work 



A.l Adjective ordering 

At this point, the treatment of adjectives in the XTAG English grammar does not include 
selectional or ordering restrictions .[] Consequently, any adjective can adjoin onto any noun and 
on top of any other adjective already modifying a noun. All of the modified noun phrases shown 
in (499)-(502) currently parse. 

(499) big green bugs 



(500) big green ideas 



(501) colorless green ideas 



(502) *green big ideas 

While (500)-(502) are all semantically anomalous, (502) also suffers from an ordering prob- 
lem that makes it seem ungrammatical as well. Since the XTAG grammar focuses on syntactic 
constructions, it should accept (499)-(501) but not (502). Both the auxiliary and determiner 
ordering systems are structured on the idea that certain types of lexical items (specified by fea- 
tures) can adjoin onto some types of lexical items, but not others. We believe that an analysis 
of adjectival ordering would follow the same type of mechanism. 



A. 2 More work on Determiners 



In addition to the analysis described in Chapter |i~8|, there remains work to be done to complete 
the analysis of determiner constructions in English.^ Although constructions such as determiner 
coordination are easily handled if overgeneration is allowed, blocking sequences such as one and 
some while allowing sequences such as five or ten still remains to be worked out. There are 



1 This section is a rep eat of information found in s ection 19.1 
2 This section is from [Hockey and Mateyak, 1998 . 



238 



A.3. -ING ADJECTIVES 



239 



still a handful of determiners that are not currently handled by our system. We do not have 
an analysis to handle most, such, certain, other and own^. In addition, there is a set of lexical 
items that we consider adjectives {enough, less, more and much) that have the property that 
they cannot cooccur with determiners. We feel that a complete analysis of determiners should 
be able to account for this phenomenon, as well. 

A. 3 -ing adjectives 

An analysis has already been provided for past participal (-ed) adjectives (as in sentence (503)), 
which are restricted to the Transitive Verb family.^ A similar analysis needs to take place for 
the present participle (-ing) used as a pre-nominal modifier. This type of adjective, however, 
does not seem to be as restricted as the -ed adjectives, since verbs in other tree families seem 
to exhibit this alternation as well (e.g. sentences (504) and (505)). 

(503) The murdered man was a doctoral student at UPenn . 

(504) The man died . 

(505) The dying man pleaded for his life . 

A. 4 Verb selectional restrictions 

Although we explicitly do not want to model semantics in the XTAG grammar, there is some 
work along the syntax/semantics interface that would help reduce syntactic ambiguity and 
thus decrease the number of semantically anomalous parses. In particular, verb selectional 
restrictions, particularly for PP arguments and adjuncts, would be quite useful. With the 
exception of the required to in the Ditransitive with PP Shift tree family (TnxOVnxlPnx2), 
any preposition is allowed in the tree families that have prepositions as their arguments. In 
addition, there are no restrictions as to which prepositions are allowed to adjoin onto a given 
verb. The sentences in (506)-(508) are all currently accepted by the XTAG grammar. Their 
violations are stronger than would be expected from purely semantic violations, however, and 
the presence of verb selectional restrictions on PP's would keep these sentences from being 
accepted. 

(506) ^survivors walked of the street . 

(507) #The man about the earthquake survived . 

(508) ^tThe president arranged on a meeting . 

3 The behavior of own is sufficiently unlike other determiners that it most likely needs a tree of its own, 
adjoining onto the right-hand side of genitive determiners. 

4 This analysis may need to be extended to the Transitive Verb particle family as well. 
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A. 5 Thematic Roles 

Elementary trees in TAGs capture several notions of locality, with the most primary of these 
being locality of #-role assignment. Each elementary tree has associated with it the 6-voles 
assigned by the anchor of that elementary tree. In the current XTAG system, while the notion 
of locality of #-role assignment within an elementary tree has been implicit, the #-roles assigned 
by a head have not been explicitly represented in the elementary tree. Incorporating #-role 
information will make the elementary trees more informative and will enable efficient pruning 
of spurious derivations when embedded into a specific context. In the case of a Synchronous 
TAG, #-roles can also be used to automatically establish links between two elementary trees, 
one in the object language and one in the target language. 



Appendix B 

Metarules 



B.l Introduction 



XTAG has now a collection of functions accessible from the user interface that helps the user in 
the construction and maintenance of a tag tree-grammar. This subsystem is based on the idea of 
metarules ( |Becker, 1993|| ), Here our primary purpose is to describe the facilities implemented 
under this metarule-based subsystem. For a discussion of the metarules as a method for compact 
representation of the Lexicon see [ Becker, 1993 | and [{Srinivas et al, 1994 ] . 

The basic idea of using metarules is to take profit of the similarities of the relations involving 
related pairs of XTAG elementary trees. For example, in the English grammar described in 
this technical report, comparing the XTAG trees for the basic form and the wh-subject moved 
form, the relation between this two trees for transitive verbs (anxoVnxi, aWonxoVnxi) is 
similar to the relation for the intransitive verbs (anxoV, oWquxoV) and also to the relation for 
the ditransitives (anxoVnxinx2, oWquxoV nx\nx2) ■ Hence, instead of generating by hand the 
six trees mentioned above, a more natural and robust way would be generating by hand only 
the basic trees for the intransitive, transitive and ditransitive cases, and letting the wh-subject 
moved trees to be automatically generated by the application of a unique transformation rule 
that would account exactly for the identical relation involved in each of the three pairs above. 

Notice that the degree of generalization can be much higher than it might be thought in 
principle from the above paragraph. For example, once a rule for passivization is applied to the 
tree different basic trees above, the wh-subject moved rule could be again applied to generate 
the wh-moved subject versions for the passive form. Depending on the degree of regularity that 
one can find in the grammar being built, the reduction in the number of original trees can be 
exponential. 

We still make here a point that the reduction of effort in grammar construction is not the 
only advantage of the approach. Robustness, reliability and maintainability of the grammar 
achieved by the use of metarules are equally or even more important. 

In the next section we define a metarule in XTAG. Section 3 gives some linguistically 
motivated examples of metarule for the English grammar described in this technical report and 
their application. Section 4 describes the access through the user interface. 
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B.2 The definition of a metarule in XTAG 

A metarule specifies a rule for transforming grammar rules into grammar rules. In XTAG the 
grammar rules are lexicalized trees. Hence an XTAG metarule mr is a pair (lhs, rhs) of XTAG 
trees, where: 

• lhs, the left-hand side of the metarule, is a pattern tree, i.e., it is intended to present a 
specific pattern of tree to look for in the trees submitted to the application of the metarule. 

• When a metarule mr is applied to an input tree inp, the first step is to verify if the input 
tree matches the pattern specified by the lhs. If there is no match, the application fails. 

• rhs, the right-hand side of the metarule, specifies (together with lhs) the transformation 
that will be done in inp, in case of successful matching, thus generating the output tree 
of the metarule application)^. 

B.2.1 Node names, variable instantiation, and matches 

We will use the terms (lhs, rhs and inp) as introduced above to refer to the parts of a generic 
metarule being applied to an input tree. 

The nodes at lhs can take three different forms: a constant node, a typed variable node, 
and a non-typed variable node. The naming conventions for these different classes of nodes is 
given below. 

• Constant Node: Its name must not initiate by a question mark ('?' character). They 
are like we expect for names to be in normal XTAG trees; for instance, inp is expected 
to have only constant nodes. Some examples of constant nodes are NP, V, NPq, NP\, 
S r . We will call the two parts that compose such names the stem and the subscript. In 
the examples above NP, V and S are stems and 0, 1, r are subscripts. Notice that the 
subscript part can also be empty as in two of the above examples. 

• Non- Typed Variable Node: Its name initiates by a question mark ('?'), followed by a 
sequence of digits (i.e. a number) which uniquely identifies the variable. Examples: ?1, 
?3, 734520 We assume that there is no stem and no subscript in this names, i.e., '?' is 
just a meta-character to introduce a variable, and the number is the variable identifier. 

• Typed Variable Node: Its name initiates by a question mark ('?') followed by a 
sequence of digits, but is additionally followed by a type specifiers definition. A type 
specifiers definition is a sequence of one or more type specifier separated by a slash ('/'). 
A type specifier has the same form of a regular XTAG node name (like the constant nodes), 
except that the subscript can be also a question mark. Examples of typed variables are: 
?1VP (a single type specifier with stem VP and no subscript), ?3NP\/ PP (two type 
specifiers, NP\ and PP), 11NP? (one type specifier, NP? with undetermined subscript). 

1 actually more than one output tree can be generated from the successful application of a rule to an input 
tree, as will be seen soon 

2 Notice however that having the sole purpose of distinguishing between variables, a number like the one in 
the last example is not very likely to occur, and a metarule with more than three thousand variables can give 
you a place in the Guinness TagBook of Records. 
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We'll see ahead that each type specifier represents an alternative for matching, and the 
presence of '?' in subscript position of a type specifier means that matching will only 
check for the stem 

During the process of matching, variables are associated (we use the term instantiated) with 
'tree material'. According to its class a variable can be instantiated with different kinds of tree 
material: 

• A typed variable will be instantiated with exactly one node of the input tree, which is in 
accordance to one of its type specifiers (The full rule is in the following subsection). 

• A non-typed variable will be instantiated with a range of subtrees. These subtrees will 
be taken from one of the nodes of the input tree inp. Hence, there will a node n in inp, 
with subtrees n.ti, n.t2, ■ n.ty., in this order, where the variable will be instantiated with 
some subsequence of these subtrees (e.g., n.t2, n.ts, n.ti). Note however, that some of 
these subtrees, may be incomplete, i.e., they may not go all the way to the bottom leaves. 
Entire subtrees may be removed. Actually for each child of the non-typed variable node, 
one subtree that matches this child subtree will be removed from some of the n.ti (maybe 
an entire n.ti), leaving in place a mark for inserting material during the substitution of 
occurences at rhs. 

Notice still that the variable can be instantiated with a single tree and even with no tree. 

We define a match to be a complete instantiation of all variables appearing in the metarule. 
In the process of matching, there may be several possible ways of instantiating the set of 
variables of the metarule, i.e., several possible matches. This is due to the presence of non- 
typed variables. 

Now, we are ready to define what we mean by a successful matching. The process of 
matching is successful if the number of possible matches is greater then 0. When there is no 
possible match the process is said to fail. In addition to return success or failure, the process 
also return the set of all possible matches, which will be used for generating the output. 

B.2.2 Structural Matching 

The process of matching lhs and inp can be seen as a recursive procedure for matching trees, 
starting at their roots and proceeding in a top-down style along with their subtrees. In the 
explanation of this process that follows we have used the term lhs not only to refer to the 
whole tree that contains the pattern but to any of its subtrees that is being considered in a 
given recursive step. The same applies to inp. By now we ignore feature equations, which will 
be accounted for in the next subsection. 

The process described below returns at the end the set of matches (where an empty set 
means the same as failure). We first give one auxiliary definition, of valid Mapping, and one 
recursive function Match, that matches lists of trees instead of trees, and then define the process 
of matching two trees as a special case of call to Match. 



3 This is different from not having a subscript which is interpreted as checking that the input tree have no 
subscript for matching 
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Given a list listihs = [lhsi,lhs2, lhs{\ of nodes of lhs and a list listi np = [inpi,inp2, ...,inpi\ 
of nodes of inp, we define a mapping from listihs to listi np to be a function Mapping, that for 
each element of listihs assigns a list of elements of listi np , defined by the following condition: 

concatenation (Mapping(lhsi), M apping(lhs2) , Mapping(lhsi)) = listi np 

, i.e., the elements of listinp are split into sublists and assigned in order of appearance in the 
list to the elements of listihs- 

We say that a mapping is a valid mapping if for all j, 1 < j < I (where I is the length of 
listihs), the following restrictions apply: 

1. if Ihsj is a constant node, then Mapping(lhsj) must have a single element, say, rhs g ^, 
and the two nodes must have the same name and agree on the markers (foot, substitution, 
head and NA), i.e., if Ihsj is NA, then rhs g Q) must be NA, if Ihsj has no markers, then 
rhs g (j\ must have no markers, etc. 

2. if Ihsj is a type variable node, then M apping(lhs j) must have a single element, say, 
rhs g (jj, and rhs g rj\ must be marker- compatible and type- compatible with Z/tSj. 

rhsg(j) is marker- compatible with ihsj if any marker (foot, substitution, head and NA) 
present in Ihsj is also present in rhsg^n 

rhsg(j) is type- compatible with Z/tSj if there is at least one of the alternative type specifiers 
for the typed variable that satisfies the conditions below. 

• rhs g fj\ has the stem defined in the type specifier. 

• if the type specifier doesn't have subscript, then rhs g u-\ must have no subscript. 

• if the type specifier has a subscript different of '?', then rhs g ^ must have the same 
subscript as in the type specifier []. 

3. if Ihsj is a non-typed variable node, then there's actually no requirement: Mapping(lhsj) 
may have any length and even be empty. 

The following algorithm, Match, takes as input a list of nodes of lhs and a list of nodes of 
inp, and returns the set of possible matches generated in the attempt of match this two lists. 
If the result is an empty set, this means that the matching failed. 

Function Match (listihs, Hst r h s ) 

Let MAPPINGS be the list of all valid mappings from listihs to list r h s 
Make MATCHES = 

For each mapping Mapping £ MAPPINGS do: 
Make Matches = {0} 

For each j, 1 < j < I, where I = length(listih s ), do: 
if Ihsj is a constant node, then 

4 Notice that, unlike the case for the constant node, the inverse is not required, i.e., if Ihsj has no marker, 
rhsgij) is still allowed to have some. 

5 If the type specifier has a '?' subscript, there is no restriction, and that is exactly its function: to allow for 
the matching to be independent of the subscript 
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let children^ be the list of children of Ihsj 

lhr g (j) be the single element in Mapping (Ihsj) 
children^ be the list of children of lhr g tj\ 
Make Matches = {m U mj \ m £ Matches 

and m,j 6 Match(c/ii/dren;/ ls , children r h s )} 
if Ihsj is a typed variable node, then 

let children's be the list of children of Ihsj 

lhr g (jj be the single element in Mapping(lhsj) 
children^ be the list of children of lhr g ^ 
Make Matches = {{(Ihsj ,lhr g ^)} U mU mj \ m £ Matches 
and m,j £ M&tch(childrenih s , children r h s )} 
if Ihsj is a non-typed variable node, then 

let childrenihs be the list of children of Ihsj 
si be the number of nodes in childrenihs 
DESC S be the set of s-size lists given by: 
DESC S = {[dn,dr 2 ,...,dr s ] | 

for every 1 < k < s, dr^ is a descendant 

of some node in Mapping(lhsj)} 6 
for every 1 < k < s, dri is to the right of dr^\ . 
For every list Desc = [dr%, dr2, dr s ] £ DESC S do: 
Let Tree-Material be the list of subtrees dominated 

by the nodes in Mapping(lhsj), but, with the 
subtrees dominated by the nodes in DESC S 
cut out from these trees 
Make Matches = {{(Ihsj, Tree — Struct)} U m U nij \ 

m £ Matches and nij £ M&tch^childrenihs, Desc)} 
Make MATCHES = MATCHES U Matches 
Return MATCHES 

Finally we can define the process of structurally matching lhs to inp as the evaluation of 
Match([root(lhs)], [root(inp)]. If the result is an empty set, the matching failed, otherwise the 
resulting set is the set of possible matches that will be used for generating the new trees (after 
being pruned by the feature equation matching). 

B.2.3 Output Generation 

Although nothing has yet been said about the feature equations, which is the subject of the 
next subsection, we assume that only matches that meet the additional constraints imposed by 
feature equations are considered for output. If no structural match survives feature equations 
checking, that matching has failed. 

If the process of matching lhs to inp fails, there are two alternative behaviors according to 
the value of a parameter^. If the parameter is set to false, which is the default value, no output 
is generated. On the other hand, if it is set to true, then the own inp tree is copied to the 
output. 

8 the parameter is accessible at the Lisp interface by the name XTAG::*rnetarules-copy-unmatched-trees* 
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If the process of matching succeeds, as many trees will be generated in the output as the 
number of possible matches obtained in the process. For a given match, the output tree is 
generated by substituting in the rhs tree of the metarule the occurrences of variables by the 
material to which they have been instantiated in the match. The case of the typed-variable 
is simple. The name of the variable is just substituted by the name of the node to which it 
has been instantiated from inp. A very important detail is that the marker (foot, substitution, 
head, NA, or none) at the output tree node comes from what is specified in the rhs node, which 
can be different of the marker at the variable node in inp and of the associated node from inp. 

The case of the non-typed variable, not surpringly, is not so simple. In the output tree, this 
node will be substituted by the subtree list that was associated to this node, in the same other, 
attaching to the parent of this non-typed variable node. But remember, that some subtrees may 
have been removed from some of the trees in this list, maybe entire elements of this list, due 
to the effect of the children of the metavariable in lhs. It is a requirement that any occurence 
of a non-typed variable node at the rhs tree has exactly the same number of children than the 
unique occurence of this non-typed variable node in lhs. Hence, when generating the output 
tree, the subtrees at rhs will be inserted exactly at the points where subtrees were removed 
during matching, in a positional, one to one correspondance. 

For feature equations in the output trees see the next subsection. The comments at the 
output are the comments at the lhs tree of the metarule followed by the coments at inp, both 
parts introduced by appropriate headers, allowing the user to have a complete history of each 
tree. 

B.2.4 Feature Matching 

In the previous subsections we have considered only the aspects of a metarule involving the 
structural part of the XTAG trees. In a feature based grammar as XTAG is, accounting for 
features is essential. A metarule is not really worth if it doesn't account for the proper change 
of feature equations^ from the input to the output tree. The aspects that have to be considered 
here are: 

• Which feature equations should be required to be present in inp in order for the match 
to succeed. 

• Which feature equations should be generated in the output tree as a function of the feature 
equations in the input tree. 

Based on the possible combinations of these requirements we partition the feature equations 
into the following five classes| TO |: 

• Require & Retain: Feature equations in this class are required to be in inp in order for 
matching to succeed. Upon matching, these equations will be copied to the output tree. 

9 Notice that what is really important is not the features themselves, but the feature equations that relate the 
feature values of nodes of the same tree 

10 This classification is really a partition, i.e., no equation may be conceptually in more than one class at the 
same time. 
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To achieve this behaviour, the equation must be placed in the lhs tree of the metarule 
preceded by a plus character (e.g. +V.t :< trans >= 

• Require & Don't Copy: The equation is required to be in inp for matching, but should 
not be copied to the output tree. Those equations must be in lhs preceded by minus 
character (e.g. —NPi :< case >= acc). 

• Optional & Don't Copy: The equation is not required for matching, but we have to make 
sure not to copy it to the output tree set of equations, regardless of it being present or 
not in inp. Those equations must be in lhs in raw form, i.e. neither preceded by a plus 
nor minus character (e.g. S r .b :< perfect >= VP.t :< perfect >). 

• Optional & Retain: The equation is not required for matching but, in case it is found 
in inp it must be copied to the output tree. This is the default case, and hence these 
equations should not be present in the metarule specification. 

• Add: The equation is not required for matching but we want it to be put in the output 
tree anyway. These equations are placed in raw form in the rhs (notice in this case it is 
the right hand side). 



Typed variables can be used in feature equations in both lhs and rhs. They are intended to 
represent the nodes of the input tree to which they have been instantiated. For each resulting 
match from the structural matching process the following is done: 

• The (typed) variables in the equations at lhs and rhs are substituted by the names of 
the nodes they have been instantiated to. 

• The requirements concerning feature equations are checked, according to the above rules. 

• If the match survives feature equation checking, the proper output tree is generated, 



according to Section B.2.5 and to the rules described above for the feature equations. 



Finally, a new kind of metavariable, which is not used at the nodes, can be introduced in 
the feature equations part. They have the same form of the non-typed variables, i.e. quotation 
mark, followed by a number, and are used in the place of feature values and feature names. 
Hence, if the equation ?NP?.b :<?2 >=?3 appears in lhs, this means, that all feature equations 
of inp that match a bottom attribute of some NP to any feature value (but not to a feature 
path) will not be copied to the output. 



B.3 Examples 

Figure p.l| shows a metarule for wh-movement of the subject. Among the trees to which it have 
been applied are the basic trees of intransitive, transitive and ditransitive families (including 
prepositional complements), passive trees of the same families, and ergative. 

11 Commutativity of equations is accounted for in the system. Hence an equation x — y can also be specified 
as y = x. Associativity is not accounted for and its need by an user is viewed as indicating misspecification at 
the input trees. 
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[¥] Ihs wh-subj | 



?2NP4 ?1 



Ihs 




Figure B.l: Metarule for wh-movement of subject 



Figure BJ2 shows a metarule for wh-movement of an NP in object position. Among the 
trees to which it have been applied are the basic and passive trees of transitive and ditransitive 
families. 




Figure B.2: Metarule for wh-movement of object 



Figure 



shows a metarule for general wh-movement of an NP. It can be applied to 



generate trees with either subject or object NP moved. We show in Figure B.4, the basic tree 
for the family TnxOVnxlPnx2 and the tree wh-trees generated by the application of the rule. 



B.4 The Access to the Metarules through the XTAG Interface 

We first describe the access to the metarules subsystem using buffers with single metarule 
applications. Then we proceed by describing the application of multiple metarules in what we 
call the parallel, sequential, and cumulative modes to input tree files. 

We have defined conceptually a metarule as an ordered pair of trees. In the implementation 
of the metarule subsystem it works the same: a metarule is a buffer with two trees. The name 
of the metarule is the name of the buffer. The first tree that appear in the main window under 
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Figure B.3: Metarule for general wh movement of an NP 





nxOVnxlPnx2 



subject moved 



NPl S, 
NPol VP 



NP object moved NP object moved from PP 



NP| S r 
NP | VP 
V* NP,| PP 2 



Figure B.4: Application of wh-movement rule to TnxOVnxlPnx2 



the metarule buffer is the left hand side, the next appearing below is the right hand si d^^ . The 
positional approach allows us to have naming freedom: the tree names are irrelevantf% Since 



12 Although a buffer is intended to implement the concept of a set (not a sequence) of trees we take profit of 
the actual organization of the system to realize the concept of (ordered) tree pair in the implementation. 

13 so that even if we want to have mnemonic names resembling their distinct character - left or right hand side, 



250 



APPENDIX B. METARULES 



we can save buffers into text files, we can talk also about metarule files. 
The available options for applying a metarule which is in a buffer are: 

• For applying it to a single input tree, click in the name of the tree in the main window, 
and choose the option apply metarule to tree. You will be prompted for the name of 
the metarule to apply to the tree which should be, as we mentioned before, the name of 
the buffer that contains the metarule trees. The output trees will be generated at the 
end of the buffer that contains the input tree. The names of the trees depend of a LISP 
parameter *metarules- change-name* . If the value of the parameter is false — the default 
value — then the new trees will have the same name as the input, otherwise, the name of 
the input tree followed by a dash ('-') and the name of the right hand side of the treep^. 

The value of the parameter can be changed by choosing Tools at the menu bar and then 
either name mr output trees = input or append rhs name to mr output trees. 

• For applying it to all the trees of a buffer, click in the name of the buffer that contains 
the trees and proceed as above. The output will be a new buffer with all the output trees. 
The name of the new buffer will be the same as the input buffer prefixed by " MR-" . The 
names of the trees follow the conventions above. 

The other options concern application to files (instead of buffers). Lets first define the 
concepts of parallel, sequential and cumulative application of metarules. One metarule file can 
contain more than one metarule. The first two trees, i.e., the first tree pair, form one metarule 
- lets call it mrQ. Subsequent pairs in the sequence of trees define additional metarules — mr\, 
mr2, mr n . 



We say that a metarule file is applied in parallel to a tree (see Figure |B.5j ) if each of the 
metarules is applied independently to the input generating its particular output treesP|. 
We generalize the concept to the application in parallel of a metarule file to a tree file 
(with possibly more than one tree), generating all the trees as if each metarule in the 
metarule file was applied to each tree in the input file. 



Input 

Trees 



1 



Output 
Trees 



Figure B.5: Parallel application of metarules 



- we have some naming flexibility to call them e.g. Ihs23 or Ihs-passive, ... 

14 the reason why we do not use the name of the metarule, i.e. the name of the buffer, is because in some forms 
of application the metarules do not carry individual names, as we'll see soon is the case when a set of metarules 
from a file is applied. 

15 remember a metarule application generates as many output trees as the number of matches 
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• We say that a metarule file mro, mri, mr2, ...,mr n is applied in sequence to a input tree 
file (see Figure |B.6| ) if we apply mro to the trees of the input file, and for each < i < n 
apply metarule mri to the trees generated as a result of the application of rm*j_i. 



Input 

Trees 



Output 
Trees 



Figure B.6: Sequential application of metarules 



• Finally, the cumulative application is similar to the sequential, except that the input trees 
at each stage are by-passed to the output together with the newly generated ones (see 
Figure lO). 



Input 
Trees 



r 



Output 
Trees 



Figure B.7: Cumulative application of metarules 



Remember that in case of matching failure the output result is decided as explained in 



subsection B.2.3 either to be empty or to be the input tree. The reflex here of having the 



parameter set for copying the input is that for the parallel application the output will have as 
many copies of the input as matching failures. For the sequential case the decision apply at 
each level, and setting the parameter for copying, in a certain sense, guarantees for the 'pipe' 
not to break. Due to its nature and unlike the two other modes, the cumulative application is 
not affected by this parameter. 

The options for application of metarules to files are available by clicking at the menu item 
Tools and then choosing the appropriate function among: 



• Apply metarule to files: You'll be prompted for the metarule file name which should 
contain one metarule^], and for input file names. Each input file name inpfile will be 
independently submitted to the application of the metarule generating an output file with 
the name MR-inpfile. 

• Apply metarules in parallel to files: You'll be prompted for the metarules file name with 
one or more metarules and for input file names. Each input file name inpfile will be 
independently submitted to the application of the metarules in parallel. For each parallel 
application to a file inpfile an output file with the name MRP-inpfile will be generated. 

• Apply metarules in sequence to files: The interaction is as described for the application in 
parallel, except that the application of the metarules are in sequence and that the output 
files are prefixed by MRS- instead of MRP-. 



if it contains more than 2 trees, the additional trees are ignored 
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• Apply metarules cumulatively to files: The interaction is as described for the applications 
in parallel and in sequence, except that the mode of application is cumulative and that 
the output files are prefixed by MRC-. 



Finally still under the Tools menu we can change the setting of the parameter that controls 
the output result on matching failure (see Subsection B.2.3 ) by choosing either copy input on 
mr matching failure or no output on mr matching failure. 



Appendix C 

Lexical Organization 



C.l Introduction 

An important characteristic of an FB-LTAG is that it is lexicalized, i.e., each lexical item is 
anchored to a tree structure that encodes subcategorization information. Trees with the same 
canonical subcategorizations are grouped into tree families. The reuse of tree substructures, 
such as wh- movement, in many different trees creates redundancy, which poses a problem for 



grammar development and maintenance [ Vijay-Shanker and Schabes, 1992 1 . To consistently 



implement a change in some general aspect of the design of the grammar, all the relevant 
trees currently must be inspected and edited. Vijay Shanker and Schabes suggested the use of 
hierarchical organization and of tree descriptions to specify substructures that would be present 
in several elementary trees of a grammar. Since then, in addition to ourselves, Becker, [Becker, 



1994 1, Evans et al. | Evans et al, 1995 1, and Candito |Candito, 1996f| have developed systems for 



organizing trees of a TAG which could be used for developing and maintaining grammars. 



Our system is based on the ideas expressed in Vijay-Shanker and Schabes, [Vijay-Shanker 



and Schabes, 1992] , to use partial-tree descriptions in specifying a grammar by separately 



defining pieces of tree structures to encode independent syntactic principles. Various individual 
specifications are then combined to form the elementary trees of the grammar. The chapter 
begins with a description of our grammar development system, and its implementation. We 
will then show the main results of using this tool to generate the Penn English grammar as 
well as a Chinese TAG. We describe the significant properties of both grammars, pointing out 
the major differences between them, and the methods by which our system is informed about 
these language-specific properties. The chapter ends with the conclusion and future work. 



C.2 System Overview 

In our approach, three types of components - subcategorization frames, blocks and lexical 
redistribution rules - are used to describe lexical and syntactic information. Actual trees 



are generated automatically from these abstract descriptions, as shown in Figure C.l . In 
maintaining the grammar only the abstract descriptions need ever be manipulated; the tree 
descriptions and the actual trees which they subsume are computed deterministically from 
these high-level descriptions. 
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Subcategorization 
frames 



Lexical redistribution 
rules(LRRs) 



Subcategorization 
blocks 



Our System 



*■ Sets of Trees 



Transformation 
blocks 



Figure C.l: Lexical Organization: System Overview 



C.2.1 Subcategorization frames 

Subcategorization frames specify the category of the main anchor, the number of arguments, 
each argument's category and position with respect to the anchor, and other information such 
as feature equations or node expansions. Each tree family has one canonical subcategorization 
frame. 



C.2.2 Blocks 

Blocks are used to represent the tree substructures that are reused in different trees, i.e. blocks 
subsume classes of trees. Each block includes a set of nodes, dominance relation, parent relation, 
precedence relation between nodes, and feature equations. This follows the definition of the tree 
descriptions specified in a logical language patterned after Rogers and Vijay-Shanker| Rogers and 



Vijay-Shankar, 1994 ) 



Blocks are divided into two types according to their functions: subcategorization blocks 
and transformation blocks. The former describes structural configurations incorporating the 
various information in a subcategorization frame. For example, some of the subcategorization 
blocks used in the development of the English grammar are shown in Figure C.2.] 



When the subcategorization frame for a verb is given by the grammar developer, the system 
will automatically create a new block (of code) by essentially selecting the appropriate primi- 
tive subcategorization blocks corresponding to the argument information specified in that verb 
frame. 

The transformation blocks are used for various transformations such as wh- movement. 
These transformation blocks do not encode rules for modifying trees, but rather describe the 



properties of a particular syntactic construction. Figure |C.3| depicts our representation of 
phrasal extraction. This can be specialized to give the blocks for wh-movement, topicaliza- 
tion, relative clause formation, etc. For example, the wh-movement block is defined by further 



1 In order to focus on the use of tree descriptions and to make the figures less cumbersome, we show only 
the structural aspects and do not show the feature value specification. The parent, (immediate dominance), 
relationship is illustrated by a plain line and the dominance relationship by a dotted line. The arc between nodes 
shows the precedence order of the nodes are unspecified. The nodes' categories are enclosed in parentheses. 
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specifying that the ExtractionRoot is labeled S, the NewSite has a +wh feature, and so on. 



Root 
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(b)pred_has_subject 



Subject('NP') 



(c) subject_is_NP 
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v Pred Pred Object 

(d) main_pred_is_V (e)pred_has_object 

Figure C.2: Some subcategorization blocks 
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(b) wh-movement ( c ) relative clause 

Figure C.3: Transformation blocks for extraction 
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C.2.3 Lexical Redistribution Rules (LRRs) 

The third type of machinery available for a grammar developer is the Lexical Redistribution 
Rule (LRR). An LRR is a pair (77, r r ) of subcategorization frames, which produces a new frame 
when applied to a subcategorization frame s, by first matching the left frame r\ of r to s, then 
combining information in r r and s. LRRs are introduced to incorporate the connection between 
subcategorization frames. For example, most transitive verbs have a frame for active(a subject 
and an object) and another frame for passive, where the object in the former frame becomes 
the subject in the latter. An LRR, denoted as passive LRR, is built to produce the passive 
subcategorization frame from the active one. Similarly, applying dative-shift LRR to the frame 
with one NP subject and two NP objects will produce a frame with an NP subject and an PP 
object. 

Besides the distinct content, LRRs and blocks also differ in several aspects: 

• They have different functionalities: Blocks represent the substructures that are reused in 
different trees. They are used to reduce the redundancy among trees; LRRs are introduced 
to incorporate the connections between the closely related subcategorization frames. 

• Blocks are strictly additive and can be added in any order. LRRs, on the other hand, 
produce different results depending on the order they are applied in, and are allowed to 
be non-additive, i.e., to remove information from the subcategorization frame they are 
being applied to, as in the procedure of passive from active. 




to E to e 



(a) (b) (c) 

Figure C.4: Elementary trees generated from combining blocks 

C.2.4 Tree generation 

To generate elementary trees, we begin with a canonical subcategorization frame. The system 
will first generate related subcategorization frames by applying LRRs, then select subcate- 
gorization blocks corresponding to the information in the subcategorization frames, next the 
combinations of these blocks are further combined with the blocks corresponding to various 

2 Matching occurs successfully when frame s is compatible with r; in the type of anchors, the number of 
arguments, their positions, categories and features. In other words, incompatible features etc. will block certain 
LRRs from being applied. 
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transformations, finally, a set of trees are generated from those combined blocks, and they are 



the tree family for this subcategorization frame. Figure |C.4| shows some of the trees produced 
in this way. For instance, the last tree is obtained by incorporating information from the di- 
transitive verb subcategorization frame, applying the dative-shift and passive LRRs, and then 
combining them with the wh- non-subject extraction block. Besides, in our system the hierarchy 



for subcategorization frames is implicit as shown in Figure C.5 



! subject_is_NP j j object_is_NP.i ! main_anchor_is_verbi 



! object_is_APj 




Intransitive! walk) 

Intrans. with adj 
(feel happy) 
Transitive(buy) Intrans . pa rt icle(add up) 




S nS - Wlth SLn,w lt hPp/ -Trans. P a ltl c,e (p ,L up, 

( du a ' \ Trans, idioms 

^ Ditrans with S \ (kick the bucket) 

(force sb to do sth) x v ' 

Trans, light verb 
(take a walk) 



Figure C.5: Partial inheritance lattice in English 



C.3 Implementation 

The input of our system is the description of the language, which includes the subcategorization 
frame list, LRR list, subcategorization block list and transformation lists. The output is a list 



of trees generated automatically by the system, as shown in Figure C.6. The tree generation 



module is written in Prolog, and the rest part is in C. We also have a graphic interface to input 



the language description. Figure |C . 7| and C.5 are two snapshots of the interface 



C.4 Generating grammars 

We have used our tool to specify a grammar for English in order to produce the trees used in 
the current English XTAG grammar. We have also used our tool to generate a large grammar 
for Chinese. In designing these grammars, we have tried to specify the grammars to reflect the 
similarities and the differences between the languages. The major features of our specification 
of these two grammars-^] are summarized in Table C.l and C.2. 



3 Both grammars are still under development, so the contents of these two tables might change a lot in the 
future according to the analyses we choose for certain phenomenon. For example, the majority of work on 
Chinese grammar treat ba-construction as some kind of object-fronting where the character ba is either an object 
marker or a preposition. According to this analysis, an LRR rule for ba-construction is used in our grammar 
to generate the preverbal-object frame from the postverbal frame. However, there has been some argument for 
treating ba as a verb. If we later choose that analysis, the main verbs in the patterns "NPO VP" and "NPO ba 
NPI VP" will be different, therefore no LRR will be needed for it. As a result, the numbers of LRRs, subcat 
frames and tree generated will change accordingly. 
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Figure C.6: Implementation of the system 
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Figure C.7: Interface for creating a grammar 



By focusing on the specification of individual grammatical information, we have been able 
to generate nearly all of the trees from the tree families used in the current English grammar 
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Figure C.8: Part of the Interface for creating blocks 
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object fronting 
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ba-construction 
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topicalization 


of transformation 
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relativization 


blocks 
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argument-drop 


# LRRs 


6 


12 


# subcat blocks 


34 


24 


# trans blocks 


8 


15 


# subcat frames 


43 


23 


# trees generated 


638 


280 



Table C.l: Major features of English and Chinese grammars 



developed at PennQ. Our approach, has also exposed certain gaps in the Penn grammar. We 
are encouraged with the utility of our tool and the ease with which this large-scale grammar 
was developed. 

We are currently working on expanding the contents of subcategorization frame to include 
trees for other categories of words. For example, a frame which has no specifier and one NP 
complement and whose predicate is a preposition will correspond to PP — * P NP tree. We'll 
also introduce a modifier field and semantic features, so that the head features will propagate 

4 We have not yet attempted to extend our coverage to include punctuation, it-clefts, and a few idiosyncratic 
analyses. 
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Table C.2: Comparison of the two grammars 



from modifiee to modified node, while non-head features from the predicate as the head of the 
modifier will be passed to the modified node. 

C.5 Summary 

We have described a tool for grammar development in which tree descriptions are used to pro- 
vide an abstract specification of the linguistic phenomena relevant to a particular language. 
In grammar development and maintenance, only the abstract specifications need to be edited, 
and any changes or corrections will automatically be proliferated throughout the grammar. In 
addition to lightening the more tedious aspects of grammar maintenance, this approach also 
allows a unique perspective on the general characteristics of a language. Defining hierarchical 
blocks for the grammar both necessitates and facilitates an examination of the linguistic as- 
sumptions that have been made with regard to feature specification and tree-family definition. 
This can be very useful for gaining an overview of the theory that is being implemented and 
exposing gaps that remain unmotivated and need to be investigated. The type of gaps that 
can be exposed could include a missing subcategorization frame that might arise from the au- 
tomatic combination of blocks and which would correspond to an entire tree family, a missing 
tree which would represent a particular type of transformation for a subcategorization frame, 
or inconsistent feature equations. By focusing on syntactic properties at a higher level, our 
approach allows new opportunities for the investigation of how languages relate to themselves 
and to each other. 



Appendix D 

Tree Naming conventions 



The various trees within the XTAG grammar are named more or less according to the following 
tree naming conventions. Although these naming conventions are generally followed, there are 
occasional trees that do not strictly follow these conventions. 

D.l Tree Families 

Tree families are named according to the basic declarative tree structure in the tree family (see 
section |D.2| ), but with a T as the first character instead of an a or (3. 

D.2 Trees within tree families 

Each tree begins with either an a (alpha) or a (3 (beta) symbol, indicating whether it is an 
initial or auxiliary tree, respectively. Following an a or a (5 the name may additionally contain 



one of: 




I 


imperative 


E 


ergative 


N0,l,2 


relative clausejposition} 


G 


NP gerund 


D 


Determiner gerund 


pW0,l,2 


wh-PP extractionjposition} 


W0,l,2 


wh-NP extractionjposition} 


X 


ECM (exceptional case marking) 



Numbers are assigned according to the position of the argument in the declarative tree, as 
follows: 

subject position 

1 first argument (e.g. direct object) 

2 second argument (e.g. indirect object) 

The body of the name consists of a string of the following components, which corresponds to 
the leaves of the tree. The anchor(s) of the trees is(are) indicated by capitalizing the part of 
speech corresponding to the anchor. 
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s 


sentence 


a 


adjective 


arb 


adverb 


be 


be 


c 


relative complementizer 


X 


phrasal category 


d 


determiner 


V 


verb 


lv 


light verb 


conj 


conjunction 


comp 


complementizer 


it 


it 


n 


noun 


P 


preposition 


to 


to 


pi 


particle 


by 


by 


nee 


negation 



As an example, the transitive declarative tree consists of a subject NP, followed by a verb (which 
is the anchor), followed by the object NP. This translates into anxOVnxl. If the subject NP 
had been extracted, then the tree would be aWOnxOVnxl. A passive tree with the by phrase in 
the same tree family would be anxlVbynxO. Note that even though the object NP has moved 
to the subject position, it retains the object encoding (nxl). 

D.3 Assorted Initial Trees 

Trees that are not part of the tree families are generally gathered into several files for conve- 
nience. The various initial trees are located in lex. trees. All the trees in this file should begin 
with an a, indicating that they are initial trees. This is followed by the root category which 
follows the naming conventions in the previous section (e.g. n for noun, x for phrasal category). 
The root category is in all capital letters. After the root category, the node leaves are named, 
beginning from the left, with the anchor of the tree also being capitalized. As an example, the 
aNXN tree is rooted by an NP node (NX) and anchored by a noun (N). 

D.4 Assorted Auxiliary Trees 

The auxiliary trees are mostly located in the buffers prepositions . trees, conjunctions .trees, 
determiners .trees, advs-adj s . trees, and modifiers .trees, although a couple of other files 
also contain auxiliary trees. The auxiliary trees follow a slightly different naming convention 
from the initial trees. Since the root and foot nodes must be the same for the auxiliary trees, 
the root nodes are not explicitly mentioned in the names of auxiliary trees. The trees are 
named according to the leaf nodes, starting from the left, and capitalizing the anchor node. All 
auxiliary trees begin with a (3, of course. For example, /3ARBs, indicates a tree anchored by 
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an adverb (ARB), that adjoins onto the left of an S node (Note that S must be the foot node, 
and therefore also the root node). 

D.4.1 Relative Clause Trees 

For relative clause trees, the following naming conventions have been adopted: if the wh- 
moved NP is overt, it is not explicitly represented. Instead the index of the site of movement 
(0 for subject, 1 for object, 2 for indirect object) is appended to the N. So /3N0nx0 Vnxl 
is a subject extraction relative clause with NP W substitution and /3Nlnx0Vnxl is an object 
extraction relative clause. If the wh-moved NP is covert and Comp substitutes in, the Comp 
node is represented by c in the tree name and the index of the extraction site follows c. Thus 
/3Nc0nx0Vnxl is a subject extraction relative clause with Comp substitution. Adjunct trees 
are similar, except that since the extracted material is not co-indexed to a trace, no index 
is specified (cf. /3Npxnx0 Vnxl, which is an adjunct relative clause with PP pied-piping, and 
/3Ncnx0Vnxl, which is an adjunct relative clause with Comp substitution). Cases of pied- 
piping, in which the pied-piped material is part of the anchor have the anchor capitalized or 
spelled-out (cf. /3Nbynx0nxlVbynx0 which is a relative clause with by-phrase pied-piping and 
NP W substitution.). 



Appendix E 

Features 



Table EA contains a comprehensive list of the features in the XTAG grammar and their possible 
values. 

This section consists of short 'biographical' sketches of the various features currently in use 
in the XTAG English grammar. 



E.l Agreement 

(agr) is a complex feature. It can have as its subfeatures: 

(agr 3rdsing), possible values: +/— 

(agr num), possible values: plur,sing 

(agr pers), possible values: 1,2,3 

(agr gen), possible values: masc, fem,neut 

These features are used to ensure agreement between a verb and its subject. 

Where does it occur: 

Nouns comes specified from the lexicon with their (agr) features, e.g. books is (agr 3rdsing): — , 
(agr num): plur, and (agr pers): 3. Only pronouns use the <gen> (gender) feature. 

The (agr) features of a noun are transmitted up the NP tree by the following equation: 
NP.b:(agr) = N.t:(agr) 

Agreement between a verb and its subject is mediated by the following feature equations: 

(509) NP su y:(agr) = VP.t:(agr) 



(510) VP.b:(agr) = V.t:(agr) 

Agreement has to be done as a two step process because whether the verb agrees with the 
subject or not depends upon whether some auxiliary verb adjoins in and upon what the (agr) 
specification of the verb is. 

Verbs also come specified from the lexicon with their (agr) features, e.g. the (agr) features 
of the verb sings are (agr 3rdsing): +, (agr num): sing, and (agr pers): 3; Non-finite forms 
of the verb sing e.g. singing do not come with an (agr) feature specification. 
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Feature 


Value 


<agr 3rdsing> 


+ ,- 


<agr num> 


plur,sing 


<agr pers> 


1,2,3 


<agr gen> 


fem,masc, neuter 


<assign-case> 


nom,acc,none 


< assign- comp> 


that,whether,if,for,ecm,rel,inf_nil,ind_nil,ppart_nil,none 


<card> 


+ ,- 


<case> 


nom , acc , gen , none 


<comp> 


that, whether,if, for, rel,inf_nil,ind_nil,nil 


<compar> 


+ ,- 


<compl> 


+ ,- 


<conditional> 


+ ,- 


<conj> 


and,or,but,comma,scolon,to,disc,nil 


< const > 


+ ,- 


<contr> 


+ ,- 


<control> 


no value, indexing only 


<decreas> 


+,- 


<dcfinite> 


+ ,- 


<displ-const> 


+ ,- 


<equiv> 


+ ,- 


<extracted> 


+ ,- 


<gen> 


+,- 


< gerund > 


+,- 


<inv> 


+,- 


<invlink> 


no value, indexing only 


<irrealis> 


+,- 


<mainv> 


+,- 


<mode> 


base,ger,ind,inf,imp,nom,ppart,prep,sbjunt 


<neg> 


+ ,- 


< passive > 


+ ,- 


<perfect> 


+ ,- 


<pred> 


+ ,- 


<progressive> 


+ ,- 


<pron> 


+ ,- 


<punct bal> 


dquote,squote,paren,nil 


<punct contains colon> 


+,- 


<punct contains dash> 


+ ,- 


<punct contains dquote> 


+ ,- 


<punct contains scolon> 


+ ,- 


<punct contains squote> 


+ , - 


<punct struct > 


comma, dash, colon, scolon,nil 


<punct term> 


per,qmark,excl,nil 


<quan> 


+,- 


<refl> 


+ ,- 


<rel-clause> 


+ ,- 


<rel-pron> 


ppart,ger,adj-clause 


<select-mode> 


ind,inf,ppart,ger 


<super> 


+ ,- 


<tense> 


pres,past 


<trace> 


no value, indexing only 


< trans > 


+ ,- 


<weak> 


+ ,- 


<wh> 





Table E.l: List of features and their possible values 
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E.l.l Agreement and Movement 

The (agr) features of a moved NP and its trace are co-indexed. This captures the fact that 
movement does not disrupt a pre-existing agreement relationship between an NP and a verb. 

(511) [Which boys]j does John think [tj are/*is intelligent]? 

E.2 Case 

There are two features responsible for case- assignment: 
(case), possible values: nom, acc, gen, none 
(assign-case), possible values: nom, acc, none 

Case assigners (prepositions and verbs) as well as the VP, S and PP nodes that dominate 
them have an (assign-case) case feature. Phrases and lexical items that have case i.e. Ns and 
NPs have a (case) feature. 

Case assignment by prepositions involves the following equations: 

(512) PP.b: (assign-case) = P.t:(case) 

(513) NP.t:(case) = P.t:(case) 

Prepositions come specified from the lexicon with their (assign-case) feature. 

(514) P.b: (assign-case) = acc 

Case assignment by verbs has two parts: assignment of case to the object (s) and assignment 
of case to the subject. Assignment of case to the object is simpler. English verbs always assign 
accusative case to their NP objects (direct or indirect). Hence this is built into the tree and 
not put into the lexical entry of each individual verb. 

(515) NP ofyect .t:(case) = acc 

Assignment of case to the subject involves the following two equations. 

(516) NP SM 6j:(case) = VP.t: (assign-case) 

(517) VP.b: (assign-case) = V.t: (assign-case) 

This is a two step process - the final case assigned to the subject depends upon the (assign- 
case) feature of the verb as well as whether an auxiliary verb adjoins in. 

Finite verbs like sings have nom as the value of their (assign-case) feature. Non-finite 
verbs have none as the value of their (assign-case) feature. So if no auxiliary adjoins in, the 
only subject they can have is PRO which is the only NP with none as the value its (case) 
feature. 
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E.2.1 ECM 

Certain verbs e.g. want, believe, consider etc. and one complementizer for are able to assign 
case to the subject of their complement clause. 

The complementizer for, like the preposition for, has the (assign-case) feature of its com- 
plement set to acc. Since the (assign-case) feature of the root S r of the complement tree and 
the (case) feature of its NP subject are co-indexed, this leads to the subject being assigned 
accusative case. 

ECM verbs have the (assign-case) feature of their foot S node set to acc. The co-indexation 
between the (assign-case) feature of the root S r and the (case) feature of the NP subject leads 
to the subject being assigned accusative case. 

E.2.2 Agreement and Case 

The (case) features of a moved NP and its trace are co-indexed. This captures the fact that 
movement does not disrupt a pre-existing relationship of case-assignment between a verb and 
an NP. 

(518) Her;/*She;, I think that Odo like U. 

E.3 Extraction and Inversion 

(extracted), possible vales are +/— 

All sentential trees with extracted components, with the exception of relative clauses are 
marked S.b(extracted) = + at their top S node. The extracted element may be a wh-NP or 
a topicalized NP. The (extracted) feature is currently used to block embedded topicalizations 
as exemplified by the following example. 

(519) * John wants [Bill; [PRO to leave t*]] 

(trace): this feature is not assigned any value and is used to co- index moved NPs and their 
traces which are marked by e. 
(wh): possible values are +/— 

NPs like who, what etc. come marked from the lexicon with a value of + for the feature 
(wh). Non wh-NPs have — as the value of their (wh) feature. Note that (wh) = + NPs 
are not restricted to occurring in extracted positions, to allow for the correct treatment of echo 
questions. 

The (wh) feature is propagated up by possessives - e.g. the + (wh) feature of the determiner 
which in which boy is propagated up to the level of the NP so that the value of the (wh) feature 
of the entire NP is +(wh). This process is recursive e.g. which boy's mother, which boy's 
mother's sister. 

The (wh) feature is also propagated up PPs. Thus the PP to whom has + as the value of 
its (wh) feature. 

In trees with extracted NPs, the (wh) feature of the root node S node is equated with the 
(wh) feature of the extracted NPs. 

The (wh) feature is used to impose subcategorizational constraints. Certain verbs like 
wonder can only take interrogative complements, other verbs such as know can take both 
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interrogative and non-interrogative complements, and yet other verbs like think can only take 
non-interrogative complements (cf. the (extracted) and (mode) features also play a role in 
imposing subcategorizational constraints). 

The (wh) feature is also used to get the correct inversion patterns. 

E.3.1 Inversion, Part 1 

The following three features are used to ensure the correct pattern of inversion: 
(wh): possible values are +/— 
(inv): possible values are +/— 
(invlink): possible values are +/— 
Facts to be captured: 

1. No inversion with topicalization 

2. No inversion with matrix extracted subject w/i-questions 

3. Inversion with matrix extracted object w/i-questions 

4. Inversion with all matrix w/i-questions involving extraction from an embedded clause 

5. No inversion in embedded questions 

6. No matrix subject topicalizations. 

Consider a tree with object extraction, where NP is extracted. The following feature equa- 
tions are used: 



(520) S g .b:(wh) = NP.t:(wh) 

(521) S 9 .b: (invlink) = Sq.b:(inv) 

(522) S g .b:(inv) = S r .t:(inv) 

(523) S r .b:(inv) = - 

Root restriction: A restriction is imposed on the final root node of any XTAG derivation of 
a tensed sentence which equates the (wh) feature and the (invlink) feature of the final root 
node. 

If the extracted NP is not a wh-woid i.e. its (wh) feature has the value — , at the end of 
the derivation, S g .b:(wh) will also have the value — . Because of the root constraint S g .b:(wh) 
will be equated to S g .b: (invlink) which will also come to have the value — . Then, by (|522[) , 
S r .t:(inv) will acquire the value — . This will unify with S r .b:(inv) which has the value — (cf. 
|523| ). Consequently, no auxiliary verb adjunction will be forced. Hence, there will never be 
inversion in topicalization. 

If the extracted NP is a ui/i- word i.e. its (wh) feature has the value +, at the end of the 
derivation, S 9 .b:(wh) will also have the value +. Because of the root constraint S 9 .b:(wh) 
will be equated to Sq.b: (invlink) which will also come to have the value +. Then, by ( |522[ ), 
S r .t:(inv) will acquire the value +. This will not unify with S r .b:(inv) which has the value 
+ (cf. |523j ). Consequently, the adjunction of an inverted auxiliary verb is required for the 
derivation to succeed. 

Inversion will still take place even if the extraction is from an embedded clause. 
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(524) Whoj does Loida think [Miguel likes U] 

This is because the adjoined tree's root node will also have its S r .b:(inv) set to — . 

Note that inversion is only forced upon us because S q is the final root node and the Root 
restriction applies. In embedded environments, the root restriction would not apply and the 
feature clash that forces adjunction would not take place. 

The (invlink) feature is not present in subject extractions. Consequently there is no inver- 
sion in subject questions. 

Subject topicalizations are blocked by setting the (wh) feature of the extracted NP to + 
i.e. only w/i-phrases can go in this location. 

E.3.2 Inversion, Part 2 
(displ-const): 

Possible values: [setl: +], [setl: — ] 

In the previous section, we saw how inversion is triggered using the (invlink), (inv), (wh) 
features. Inversion involves movement of the verb from V to C. This movement process is 
represented using the (displ-const) feature which is used to simulate Multi-Component TAGs.[] 
The sub-value setl indicates the inversion multi-component set; while there are not currently 
any other uses of this mechanism, it could be expanded with other sets receiving different set 
values. 

The (displ-const) feature is used to ensure adjunction of two trees, which in this case 
are the auxiliary tree corresponding to the moved verb (S adjunct) and the auxiliary tree 
corresponding to the trace of the moved verb (VP adjunct). The following equations are used: 

(525) S r .b:(displ-const setl) = — 

(526) S.t: (displ-const setl) = + 

(527) VP.b: (displ-const setl) = V.t: (displ-const setl) 

(528) V.b: (displ-const setl) = + 

(529) S r .b: (displ-const setl) = VP.t: (displ-const setl) 

E.4 Clause Type 

There are several features that mark clause typej^j They are: 
(mode) 

(passive): possible values are +/— 

1 The (displ-const) feature is also used in the ECM analysis. 

2 We have already seen one instance of a feature that marks clause-type: (extracted), which marks whether 
a certain S involves extraction or not. 
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(mode): possible values are base, ger, ind, inf, imp, nom, ppart, prep, sbjnct 
The (mode) feature of a verb in its root form is base. The (mode) feature of a verb in its 
past participial form is ppart, the (mode) feature of a verb in its progressive/gerundive form 
is ger, the (mode) feature of a tensed verb is ind, and the (mode) feature of a verb in the 
imperative is imp. 

nom is the (mode) value of AP/NP predicative trees headed by a null copula, prep is the 
(mode) value of PP predicative trees headed by a null copula. Only the copula auxiliary 
tree, some sentential complement verbs (such as consider and raising verb auxiliary trees have 
nom/prep as the (mode) feature specification of their foot node. This allow them, and only 
them, to adjoin onto AP/NP/PP predicative trees with null copulas. 

E.4.1 Auxiliary Selection 

The (mode) feature is also used to state the subcategorizational constraints between an aux- 
iliary verb and its complement. We model the following constraints: 
have takes past participial complements 
passive be takes past participial complements 
active be takes progressive complements 

modal verbs, do, and to take VPs headed by verbs in their base form as their complements. 

An auxiliary verb transmits its own mode to its root and imposes its subcategorizational 
restrictions on its complement i.e. on its foot node. e.g. the auxiliary have in its infinitival 
form involves the following equations: 

(530) VP r .b:(mode) = V.t:(mode) 

(531) V.t:(mode) = base 

(532) VP.b:(mode) = ppart 

(passive): This feature is used to ensure that passives only have be as their auxiliary. Passive 
trees start out with their (passive) feature as +. This feature starts out at the level of the 
verb and is percolated up to the level of the VP. This ensures that only auxiliary verbs whose 
foot node has + as their (passive) feature can adjoin on a passive. Passive trees have ppart 
as the value of their (mode) feature. So the only auxiliary trees that we really have to worry 
about blocking are trees whose foot nodes have ppart as the value of their (mode) feature. 
There are two such trees - the be tree and the have tree. The be tree is fine because its foot 
node has + as its (passive) feature, so both the (passive) and (mode) values unify; the have 
tree is blocked because its foot node has - as its (passive) feature. 

E.5 Relative Clauses 

Features that are peculiar to the relative clause system are: 
(select-mode), possible values are ind, inf, ppart, ger 
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(rel-pron), possible values are ppart, ger, adj-clause 
(rel-clause), possible values are +/— 
(select-mode): 

Comps are lexically specified for (select-mode). In addition, the (select-mode) feature of a 
Comp is equated to the (mode) feature of its sister S node by the following equation: 

(533) Comp.t:(select-mode) = S^t^mode) 

The lexical specifications of the Comps are shown below: 

• ec, Comp.t: (select-mode) =ind/inf/ger/ppart 

• that, Comp.t:(select-mode) =ind 

• for, Comp.t: (select-mode) =inf 
(rel-pron): 

There are additional constraints on where the null Comp ec can occur. The null Comp is not 
permitted in cases of subject extraction unless there is an intervening clause or or the relative 
clause is a reduced relative (mode = ppart /ger). 

To model this paradigm, the feature (rel-pron) is used in conjunction with the following 
equations. 

(534) S r .t: (rel-pron) = Comp.t: (rel-pron) 

(535) S r .b: (rel-pron) = S r .b:(mode) 

(536) Comp. b: (rel-pron) =ppart /ger /adj-clause (for ec) 

The full set of the equations above is only present in Comp substitution trees involving 
subject extraction. So the following will not be ruled out. 

(537) the toy [e* [e c [ Dafna likes t; ]]] 

The feature mismatch induced by the above equations is not remedied by adjunction of just 
any S-adjunct because all other S-adjuncts are transparent to the (rel-pron) feature because 
of the following equation: 

(538) S m .b: (rel-pron) = S/.t: (rel-pron) 
(rel-clause) : 

The XTAG analysis forces the adjunction of the determiner below the relative clause. This is 
done by using the (rel-clause) feature. The relevant equations are: 

(539) On the root of the RC: NP r .b: (rel-clause) = + 

(540) On the foot node of the Determiner tree: NP^.t: (rel-clause) = — 
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E.6 Complementizer Selection 

The following features are used to ensure the appropriate distribution of complementizers: 
(comp), possible values: that, if, whether, for, rel, inLnil, ind_nil, nil 
(assign-comp), possible values: that, if, whether, for, ecm, rel, ind_nil, inLnil, none 
(mode), possible values: ind, inf, sbjnct, ger, base, ppart, nom, prep 

(wh), possible values: +, — 

The value of the (comp) feature tells us what complementizer we are dealing with. The trees 
which introduce complementizers come specified from the lexicon with their (comp) feature 
and (assign-comp) feature. The (comp) of the Comp tree regulates what kind of tree goes 
above the Comp tree, while the (assign-comp) feature regulates what kind of tree goes below, 
e.g. the following equations are used for that 

(541) S c .b:(comp) = Comp.t:(comp) 

(542) S c .b:(wh) = Comp.t:(wh) 

(543) S c .b:(mode) = ind/sbjnct 

(544) S r .t:(assign-comp) = Comp.t:(comp) 

(545) S r .b:(comp) = nil 

By specifying S r .b:(comp) = nil, we ensure that complementizers do not adjoin onto other 
complementizers. The root node of a complementizer tree always has its (comp) feature set to 
a value other than nil. 

Trees that take clausal complements specify with the (comp) feature on their foot node 
what kind of complementizer(s) they can take. The (assign-comp) feature of an S node is 
determined by the highest VP below the S node and the syntactic configuration the S node is 
in. 

E.6.1 Verbs with object sentential complements 

Finite sentential complements: 

(546) Si.t:(comp) = that /whether /if/nil 

(547) Si. t: (mode) = ind/sbjnct or Si. t: (mode) = ind 

(548) Si. t: (assign-comp) = ind_nil/inf_nil 

The presence of an overt complementizer is optional. 
Non- finite sentential complements, do not permit for. 
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(549) Si.t:(comp) = nil 

(550) Si.t:(mode) = inf 

(551) Si.t:(assign-comp) = ind_nil/inf_nil 

Non- finite sentential complements, permit for: 

(552) Si.t:(comp) = for/nil 

(553) Si.t:(mode) = inf 

(554) Si.t:(assign-comp) = ind_nil/inf_nil 

Cases like '*I want for to win' are independently ruled out due to a case feature clash 
between the acc assigned by for and the intrinsic case feature none on the PRO. 
Non-finite sentential complements, ECM: 

(555) Si.t:(comp) = nil 

(556) Si.t:(mode) = inf 

(557) Si.t:(assign-comp) = ecm 

E.6.2 Verbs with sentential subjects 

The following contrast involving complementizers surfaces with sentential subjects: 

(558) *(That) John is crazy is likely. 

Indicative sentential subjects obligatorily have complementizers while infinitival sentential 
subjects may or may not have a complementizer. Also if is possible as the complementizer of 
an object clause but not as the complementizer of a sentential subject. 

(559) So.t:(comp) = that /whether /for /nil 

(560) S .t:(mode) = inf/ind 

(561) So.t:(assign-comp) = inf_nil 

If the sentential subject is finite and a complementizer does not adjoin in, the (assign- 
comp) feature of the So node of the embedding clause and the root node of the embedded 
clause will fail to unify. If a complementizer adjoins in, there will be no feature-mismatch 
because the root of the complementizer tree is not specified for the (assign-comp) feature. 

The (comp) feature nil is split into two (assign-comp) features ind_nil and inLnil to 
capture the fact that there are certain configurations in which it is acceptable for an infinitival 
clause to lack a complementizer but not acceptable for an indicative clause to lack a comple- 
mentizer. 
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E.6.3 That-tra.ce and /or-trace effects 

(562) Whoj do you think (*that) tj ate the apple? 

That trace violations are blocked by the presence of the following equation: 

(563) S r .b:(assign-comp) = inf_nil/ind_nil/ecm 

on the bottom of the S r nodes of trees with extracted subjects (WO). The ind_nil feature 
specification permits the above example while the inf_nil/ecm feature specification allows the 
following examples to be derived: 



(564) Who, do you want [ tj to win the World Cup]? 



(565) Whoj do you consider [ tj intelligent]? 



The feature equation that ruled out the that-trace filter violations will also serve to rule out 
the /or-trace violations above. 



E.7 Determiner ordering 

(card), possible values are +, — 
(compl), possible values are +, — 
(const), possible values are +, — 
(decreas), possible values are +, — 
(definite), possible values are +, — 
(gen), possible values are +, — 
(quan), possible values are +, — 



For detailed discussion see Chapter 18 



E.8 Punctuation 



(punct) is a complex feature. It has the following as its subfeatures: 

(punct bal), possible values are dquote, squote, paren, nil 

(punct contains colon), possible values are +, — 

(punct contains dash), possible values are +, — 

(punct contains dquote), possible values are +, — 

(punct contains scolon), possible values are +, — 

(punct contains squote), possible values are +, — 

(punct struct), possible values are comma, dash, colon, scolon, none, nil 
(punct term), possible values are per, qmark, excl, none, nil 

For detailed discussion see Chapter EjjL 
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E.9 Conjunction 

(conj), possible values are but, and, or, comma, scolon, to, disc, nil 
The (conj) feature is specified in the lexicon for each conjunction and is passed up to the root 
node of the conjunction tree. If the conjunction is and, the root (agr num) is (plural), no 
matter what the number of the two conjuncts. With or, the the root (agr num) is equated to 
the (agr num) feature of the right conjunct. 

The (conj)=disc feature is only used at the root of the /3CONJs tree. It blocks the 
adjunction of one /3CONJs tree on another. The following equations are used, where S r is 
the substitution node and S c is the root node: 

(566) S r .t:(conj) = disc 

(567) S c .b:(conj) = and/or/but/nil 

E.10 Comparatives 

(compar), possible values are +, — 
(equiv), possible values are +, — 
(super), possible values are +, — 

For detailed discussion see Chapter ^2|. 



(control) has no value and is used only for indexing purposes. The root node of every clausal 
tree has its (control) feature coindexed with the control feature of its subject. This allows 
adjunct control to take place. In addition, clauses that take infinitival clausal complements 
have the control feature of their subject /object coindexed with the control feature of their 
complement clause S, depending upon whether they are subject control verbs or object control 
verbs respectively. 



(neg), possible values are +, — 

Used for controlling the interaction of negation and auxiliary verbs, 
(pred), possible values are +, — 

The (pred) feature is used in the following tree families: TnxONl. trees and TnxOnxlARB. trees. 
In the TnxONl. trees family, the following equations are used: 



E.ll Control 



E.12 Other Features 



for aWlnxONl: 



(568) NPi.tr(pred) 



+ 



(569) NPi.b:(pred) 



+ 
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(570) NP.t:(pred) = + 

(571) N.t:(pred) = NP.b:(pred) 

This is the only tree in this tree family to use the (pred) feature. 

The other tree family where the (pred) feature is used is TnxOnxlARB. trees. Within this 
family, this feature (and the following equations) are used only in the aWlnxOnxlARB tree. 

(572) AdvPi.t:(pred) = + 

(573) AdvPi.b:(pred) = + 

(574) NP.t:(pred) = + 

(575) AdvP.b:(pred) = NP.t:(pred) 
(pron), possible values are +, - 

This feature indicates whether a particular NP is a pronoun or not. Certain constructions which 
do not permit pronouns use this feature to block pronouns, 
(tense), possible values are pres, past 

It does not seem to be the case that the (tense) feature interacts with other features / syntactic 
processes. It comes from the lexicon with the verb and is transmitted up the tree in such a 
way that the root S node ends up with the tense feature of the highest verb in the tree. The 
equations used for this purpose are: 

(576) S r .b:(tense) = VP.t:(tense) 

(577) VP.b:(tense) = V.t:(tense) 

(trans), possible values are +, — 
Many but not all English verbs can anchor both transitive and intransitive trees. 

(578) The sun melted the ice cream. 

(579) The ice cream melted. 

(580) Elmo borrowed a book. 

(581) * A book borrowed. 

Transitive trees have the (trans) feature of their anchor set to + and intransitive trees 
have the (trans) feature of their anchor set to -. Verbs such as melt which can occur in 
both transitive and intransitive trees come unspecified for the (trans) feature from the lexicon. 
Verbs which can only occur in transitive trees e.g. borrow have their (trans) feature specified 
in the lexicon as + thus blocking their anchoring of an intransitive tree. 



Appendix F 

Evaluation and Results 



In this appendix we describe various evaluations done of the XTAG grammar. Some of these 
evaluations were done on an earlier version of the XTAG grammar (the 1995 release), while 
other were done more recently. We will try to indicate in each section which version was used. 



F.l Parsing Corpora 

In the XTAG project, we have used corpus analysis in two main ways: (1) to measure the 
performance of the English grammar on a given genre and (2) to identify gaps in the grammar. 

The second type of evaluation involves performing detailed error analysis on the sentences 
rejected by the parser, and we have done this several times on WSJ and Brown data. 

Based on the results of such analysis, we prioritize upcoming grammar development efforts. 



The results of a recent error analysis are shown in Table F.l. The table does not show errors 
in parsing due to mistakes made by the POS tagger which contributed the largest number of 
errors: 32. At this point, we have added a treatment of punctuation to handle #1, an analysis of 
time NPs (#2), a large number of multi-word prepositions (part of #3), gapless relative clauses 
(#7), bare infinitives (#14) and have added the missing subcategorization (#3) and missing 
lexical entry (#12). We are in the process of extending the parser to handle VP coordination 
(#9) (See Section ^l] on recent work to handle VP and other predicative coordination). We 
find that this method of error analysis is very useful in focusing grammar development in a 
productive direction. 

To ensure that we are not losing coverage of certain phenomena as we extend the gram- 
mar, we have a benchmark set of grammatical and ungrammatical sentences from this technical 
report. We parse these sentences periodically to ensure that in adding new features and con- 
structions to the grammar, we are not blocking previous analyses. There are approximately 
590 example sentences in this set. 



F.2 TSNLP 

In addition to corpus-based evaluation, we have also run the English Grammar on the Test 
Suites for Natural Language Processing (TSNLP) English corpus [ Lehmann et al, 1996 1. The 
corpus is intended to be a systematic collection of English grammatical phenomena, including 
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Rank 


No of errors 


Category of error 


#1 


11 


Parentheticals and appositives 


#2 


8 


Time NP 


#3 


8 


Missing subcat 


#4 


7 


Multi-word construction 


#5 


6 


Ellipsis 


#6 


6 


Not sentences 


#7 


3 


Relative clause with no gap 


#8 


2 


Funny coordination 


#9 


2 


VP coordination 


#10 


2 


Inverted predication 


#11 


2 


Who knows 


#12 


1 


Missing entry 


#13 


1 


Comparative? 


#14 


1 


Bare infinitive 



Table F.l: Results of Corpus Based Error Analysis 



complementation, agreement, modification, diathesis, modality, tense and aspect, sentence and 
clause types, coordination, and negation. It contains 1409 grammatical sentences and phrases 
and 3036 ungrammatical ones. 



Error Class 


% 


Example 


POS Tag 


19.7% 


She adds to/V it , He noises/N him abroad 


Missing lex item 


43.3% 


used as an auxiliary V, calm NP down 


Missing tree 


21.2% 


should 1 ve, bet NP NP S, regard NP as Adj 


Feature clashes 


3% 


My every firm, All money 


Rest 


12.8% 


approx, e.g. 



Table F.2: Breakdown of TSNLP Errors 



There were 42 examples which we judged ungrammatical, and removed from the test corpus. 
These were sentences with conjoined subject pronouns, where one or both were accusative, 
e.g. Her and him succeed. Overall, we parsed 61.4% of the 1367 remaining sentences and 
phrases. The errors were of various types, broken down in Table F\2, As with the error analysis 
described above, we used this information to help direct our grammar development efforts. It 
also highlighted the fact that our grammar is heavily slanted toward American English — our 
grammar did not handle dare or need as auxiliary verbs, and there were a number of very British 
particle constructions, e.g. She misses him out. 

One general problem with the test-suite is that it uses a very restricted lexicon, and if 
there is one problematic lexical item it is likely to appear a large number of times and cause a 
disproportionate amount of grief. Used to appears 33 times and we got all 33 wrong. However, 
it must be noted that the XTAG grammar has analyses for syntactic phenomena that were not 
represented in the TSNLP test suite such as sentential subjects and subordinating clauses among 
others. This effort was, therefore, useful in highlighting some deficiencies in our grammar, but 
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did not provide the same sort of general evaluation as parsing corpus data. 



F.3 Chunking and Dependencies in XTAG Derivations 



We evaluated the XTAG parser for the text chunking task [Abney, 1991]. In particular, we 
compared NP chunks and verb group (VG) chu nks^ produced by th e XTAG parser with the 
NP and VG chunks from the Penn Tree-bank [ [Marcus et al, 1993(1 . The test involved 940 
sentences of length 15 words or less from sections 17 to 23 of the Penn Treebank, parsed using 



the XTAG English grammar. The results are given in Table F.3. 





NP Chunking 


VG Chunking 


Recall 


82.15% 


74.51% 


Precision 


83.94% 


76.43% 



Table F.3: Text Chunking performance of the XTAG parser 



System 


Training Size 


Recall 


Precision 


Ramshaw <fe Marcus 


Baseline 


81.9% 


78.2% 


Ramshaw & Marcus 


200,000 


90.7% 


90.5% 


(without lexical information) 








Ramshaw <fe Marcus 


200,000 


92.3% 


91.8% 


(with lexical information) 








Supertags 


Baseline 


74.0% 


58.4% 


Supertags 


200,000 


93.0% 


91.8% 


Supertags 


1,000,000 


93.8% 


92.5% 



Table F.4: Performance comparison of the transformation based noun chunker and the supertag 
based noun chunker 



As described earlier, the results cannot be directly compared with other results in chunking 
such as in [Ramshaw and Marcus, 19951 since we do not train from the Treebank before testing. 



However, in earlier work, text chunking was done using a technique called supertagging [Srinivas, 



j~997b| (which uses the XTAG English grammar) which can be used to train from the Treebank. 



The comparative results of text chunking between supertagging and other methods of chunking 
is shown in Figure |F.4| .p| 

We also performed experiments to determine the accuracy of the derivation structures pro- 
duced by XTAG on WSJ text, where the derivation tree produced after parsing XTAG is 
interpreted as a dependency parse. We took sentences that were 15 words or less from the Penn 
Treebank |Marcus et al., 1993 ] . The sentences were collected from sections 17-23 of the Tree- 
bank. 9891 of these sentences were given at least one parse by the XTAG system. Since XTAG 
typically produces several derivations for each sentence we simply picked a single derivation 



1 We treat a sequence of verbs and verbal modifiers, including auxiliaries, adverbs, modals as constituting a 
verb group. 

2 It is important to note in this comparison that the supertagger uses lexical information on a per word basis 
only to pick an initial set of supertags for a given word. 
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from the list for this evaluation. Better results might be achieved by ranking the output of the 
parser using the sort of approach described in |Srinivas et al, 1995(| . 

There were some striking differences in the dependencies implicit in the Treebank and those 
given by XTAG derivations. For instance, often a subject NP in the Treebank is linked with the 
first auxiliary verb in the tree, either a modal or a copular verb, whereas in the XTAG derivation, 
the same NP will be linked to the main verb. Also XTAG produces some dependencies within 
an NP, while a large number of words in NPs in the Treebank are directly dependent on the 
verb. To normalize for these facts, we took the output of the NP and VG chunker described 
above and accepted as correct any dependencies that were completely contained within a single 
chunk. 

For example, for the sentence Borrowed shares on the Amex rose to another record, the 
XTAG and Treebank chunks are shown below. 



XTAG chunks: 
[Borrowed shares] [on the Amex] [rose] 
[to another record] 
Treebank chunks : 
[Borrowed shares on the Amex] [rose] 
[to another record] 



Using these chunks, we can normalize for the fact that in the dependencies produced by 
XTAG borrowed is dependent on shares (i.e. in the same chunk) while in the Treebank borrowed 
is directly dependent on the verb rose. That is to say, we are looking at links between chunks , 
not between words . The dependencies for the sentence are given below. 



XTAG dependency 

Borrowed: : shares 
shares : :rose 
on: : shares 
the : : Amex 
Amex : : on 
rose : :NIL 
to : : rose 
another: : record 
record: :to 



Treebank dependency 

Borrowed: :rose 
shares : : rose 
on: : shares 
the : : Amex 
Amex : : on 
rose : : NIL 
to : : rose 
another: : record 
record: :to 



After this normalization, testing simply consisted of counting how many of the dependency 
links produced by XTAG matched the Treebank dependency links. Due to some tokenization 
and subsequent alignment problems we could only test on 835 of the original 9891 parsed 
sentences. There were a total of 6135 dependency links extracted from the Treebank. The 
XTAG parses also produced 6135 dependency links for the same sentences. Of the dependencies 
produced by the XTAG parser, 5165 were correct giving us an accuracy of 84.2%. 
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F.4 Comparison with IBM 



The evaluation in this section was done with the earlier 1995 release of the grammar. This 
section describes an experiment to measure the crossing bracket accuracy of the XTAG-parsed 
IBM-manual sentences. In this experiment, XTAG parses of 1100 IBM-manual sentences have 
been ranked using certain heuristics. The ranked parses have been compare d^ a gainst the 
bracketing given in the Lancaster Treebank of IBM-manual sentences^. Table FJ5 shows the 
results of XTAG obtained in this experiment, which used the highest ranked parse for each 
system. It also shows the results of the latest IBM statistical grammar ( Uelinek et al, 1994[ ) 
on the same genre of sentences. Only the highest-ranked parse of both systems was used for this 
evaluation. Crossing Brackets is the percentage of sentences with no pairs of brackets crossing 
the Treebank bracketing (i.e. ( ( a b ) c ) has a crossing bracket measure of one if compared 
to ( a ( b c ) ) ). Recall is the ratio of the number of constituents in the XTAG parse to the 
number of constituents in the corresponding Treebank sentence. Precision is the ratio of the 
number of correct constituents to the total number of constituents in the XTAG parse. 



System 


#of 
sentences 


Crossing Bracket 
Accuracy 


Recall 


Precision 


XTAG 


1100 


81.29% 


82.34% 


55.37% 


IBM Statistical 
grammar 


1100 


86.20% 


86.00% 


85.00% 



Table F.5: Performance of XTAG on IBM-manual sentences 



As can be seen from Table [F.5| , the precision figure for the XTAG system is considerably 
lower than that for IBM. For the purposes of comparative evaluation against other systems, 
we had to use the same crossing-brackets metric though we believe that the crossing-brackets 
measure is inadequate for evaluating a grammar like XTAG. There are two reasons for the 
inadequacy. First, the parse generated by XTAG is much richer in its representation of the 
internal structure of certain phrases than those present in manually created treebanks (e.g. 
IBM: [jv your personal computer], XTAG: [np [g your] [jy [n personal] [jv computer]]]). This 
is reflected in the number of constituents per sentence, shown in the last column of Table [F.6| .p| 



System 


Sent. 


#of 


Av. # of 


Av. # of 




Length 


sent 


words / sent 


Constituents / sent 


XTAG 


1-10 


654 


7.45 


22.03 




1-15 


978 


9.13 


30.56 


IBM Stat. 


1-10 


447 


7.50 


4.60 


Grammar 


1-15 


883 


10.30 


6.40 



Table F.6: Constituents in XTAG parse and IBM parse 
A second reason for considering the crossing bracket measure inadequate for evaluating 



3 We used the parseval program written by Phil Harison (phil@atc.boeing.com). 

4 The Treebank was obtained through Salim Roukos (roukos@watson.ibm.com) at IBM. 

5 We are aware of the fact that increasing the number of constituents also increases the recall percentage. 
However we believe that this a legitimate gain. 
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XTAG is that the primary structure in XTAG is the derivation tree from which the bracketed 
tree is derived. Two identical bracketings for a sentence can have completely different derivation 
trees (e.g. kick the bucket as an idiom vs. a compositional use). A more direct measure of the 
performance of XTAG would evaluate the derivation structure, which captures the dependencies 
between words. 



F.5 Comparison with Alvey 



The evaluation in this section was done with the earlier 1995 release of the grammar. This 
section compares XTAG to the Alvey Natural Language Tools (ANLT) Grammar. We parsed 
the set of LDOCE Noun Phrases presented in Appendix B of the technical report ( [ Par roll" 



1993 1) using XTAG. Table F.7 summarizes the results of this experiment. A total of 143 noun 
phrases were parsed. The NPs which did not have a correct parse in the top three derivations 
were considered failures for either system. The maximum and average number of derivations 
columns show the highest and the average number of derivations produced for the NPs that 
have a correct derivation in the top three. We show the performance of XTAG both with and 
without the tagger since the performance of the POS tagger is significantly degraded on the 
NPs because the NPs are usually shorter than the sentences on which it was trained. It would 
be interesting to see if the two systems performed similarly on a wider range of data. 



System 


#of 
NPs 


# parsed 


% parsed 


Maximum 
derivations 


Average 
derivations 


ANLT Parser 


143 


127 


88.81% 


32 


4.57 


XTAG Parser with 


143 


93 


65.03% 


28 


3.45 


POS tagger 












XTAG Parser without 


143 


120 


83.91% 


28 


4.14 


POS tagger 













Table F.7: Comparison of XTAG and ANLT Parser 



F.6 Comparison with CLARE 

The evaluation in this section was done with the earlier 1995 release of the grammar. This 
section compares the performance of XTAG against that of the CLARE-2 system ( JAlshawi 



et al, 19921 ) on the ATIS corpus. Table |F.8| shows the performance results. The percentage 



parsed column for both systems represents the percentage of sentences that produced any parse. 
It must be noted that the performance result shown for CLARE-2 is without any tuning of the 
grammar for the ATIS domain. The performance of CLARE-3, a later version of the CLARE 
system, is estimated to be 10% higher than that of the CLARE-2 system.^ 

In an attempt to compare the performance of the two systems on a wider range of sentences 



(from similar genres), we provide in Table F.9 the performance of CLARE-2 on LOB corpus and 



6 When CLARE-3 is tuned to the ATIS domain, performance increases to 90%. However XTAG has not been 
tuned to the ATIS domain. 
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System 


Mean length 


% parsed 


CLARE-2 


6.53 


68.50% 


XTAG 


7.62 


88.35% 



Table F.8: Performance of CLARE-2 and XTAG on the ATIS corpus 



the performance of XTAG on the WSJ corpus. The performance was measured on sentences of 
up to 10 words for both systems. 



System 


Corpus 


Mean length 


% parsed 


CLARE-2 


LOB 


5.95 


53.40% 


XTAG 


WSJ 


6.00 


55.58% 



Table F.9: Performance of CLARE-2 and XTAG on LOB and WSJ corpus respectively 
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